[cherry-pick] Support fp16 in HybridParallel and fix bugs in HybridOptimizer #36703

haohongxiang · 2021-10-25T11:53:45Z

PR types

Others

PR changes

Others

Describe

[cherrypick] Support fp16 in HybridParallel and fix bugs in HybridOptimizer

…to monitor FLAGS changing (#35849) * change __init__.py to adapt new FLAGS * test ci check, ready for revert * split __init__.py and FLAGS approval * Revert "test ci check, ready for revert" This reverts commit bbbd244.

… warnings. (#35839)

….6 (#35848) * fix bug

Add basic Cost Model, it uses executor to run program and profile it to get op time. This is an early basic version, we will add more functions in the future.

…35863)

* Optimization of pool2d grad, first commit. * remove useless print codes * refine codes * refine codes * seal more operation into template specialization * fix template struct error in MaxPool2dGrad. * Fix header including error * refine code with comment * Seal the param-preparation codes into function for common use. * Seal the param-preparation codes into function for common use. * Seal the param-preparation into funciton and make it common for other kernels * polish code and erase useless template speicalization * Rerun triger * rerun trigger

…35510) * Create stateful OneDNNAXPYHandler object. This makes it possible to call it multiple times without recreating the oneDNN primitives every time. * Prepare SGDOpKernel to reuse its implementation from OneDNN kernel. * OneDNN SGD kernel. * Update call to use new OneDNNAXPYHandler object api. * Setup seed in proper place. * Enable OneDNN kernel only for single case. * For dense param and sparse grad. * Small refactor. * Enable oneDNN by op attr or by cmd line flag. * Use int64_t type for number of elements. * Support dense param and grad from OneDNN kernel. * Enable SGD OneDNN kernel when use MP BF16 optimizer. * Force non-copyable/movable OneDNNAXPYHandler. * Reuse OneDNNAXPYHandler for spare tensors in SUM op. * Fix SFINAE rules. * Remove recording event inside AXPY. * Get rid of internal primitive caching. * Stop use PP cache mechanims to store mem and primitive obj. * Handler obj store and reuse needed desc & prim * Do not derive from MKLDNNHandlerT

…5862)

* support extern third_party lapack on Linux/Windows/Mac * fix ci

* Modify H2D and D2H as kQueue::Sync * fix interface error

* refine gc for new_executor, test=develop * refine, test=develop * refine, test=develop * merge, test=develop

* fix feed, test=develop * delete one test case, test=develop

* support nnadapter and ascend310 * modify code * add anchor_generator convert test * add gelu convert test * add conv2d convert test * modify anchor_operator convert test * modify conv2d test * modify con2d convert test * modify conv2d convert test * modify conv2d convert test * modify conv2d test * fix WITH_PYTHON compile error * modify test file * modify test file * modify test file * modify test file * modify test file * modify test file * modify test file * modify test file Co-authored-by: xiaoxiaohehe001 <[email protected]> Co-authored-by: jiweibo <[email protected]>

#35879)

* support ernie-int8 test and prune op attribute test * remove using and use namespace * remove macro and use shell instead * Revert "remove macro and use shell instead" This reverts commit 615964b. * fix grammar error * fix shell error

…35837)

Add new API : paddle.linalg.det & paddle.linalg.slogdet API Alias：paddle.det& paddle.slogdet

* init functional jacobian api * finish test with dtype float32 * add float64 test case * polish code * use atol=1e-5 with dtype float64 * fix for ci * set timeout for test_jacobian * polish API docstring * modify docstring

* refactored reshape multiop kernel and added flatten1/2 kernels * added formatting for flatten tests * CI fix * disabled reshape_kernel ops after succesful CI run * minor fix

* A leap of try for cudaLaunchCooperativeKernel * fix bugs * Totally replace the lar cuda kernel * Fix bugs * fix code according to comments * fix codes according to review comments * adding some function overload * relocate the power operation.

* fix extra op for expand, expand_as, tile, unstack * fix unique unstack dim 0 * Update expand_v2_op.cc * fix unique_op format

* gloo hdfs set check & gloo connect retry * add vlog * print gloo connect addr & add vlog * . * modify vlof * modify vlog * modify vlog

* Add Basic CINN Runner Class * Add CinnCacheKey * Add Cache logic and improve CinnCacheKey * Modify as reviewer commented * Implement hash_combine to fix MAC build.

* Initial Commit * add unittest and add error information * modify doc * fix some error * fix some word * fix bug cudaDeviceProp* and modify error explanation * fix cudaDeviceProp* error and unnitest samples * fix hip error and PADDLE_WITH_HIP * update style * fix error is_compiled_with_cuda * fix paddle.device.cuda.get_device_properties * fix error for multi thread safe * update style * merge conflict * modify after mentor review * update style * delete word * fix unittest error for windows * support string input and modify some code * modify doc to support string input * fix error for express information * fix error for express information * fix unnitest for windows * fix device.startswith('gpu:') * format error and doc * fix after review * format code * fix error for doc compile * fix error for doc compile * fix error for doc compile * fix error for doc compile * fix error for doc compile * fix py2 error * fix wrong words and doc * fix _gpuDeviceProperties

…#36123)

…main (#36121) * read envs in flags_map * add flags to undefok

* fix dygraph double grad dtype error when calling for high differential senario * reinvoke ci * add test for partial_engine.cc

remove recent linalg api in paddle.init; add args 'name' in some new linalg api interface same change in develop branch to #36112

* [HIP] fix op not support AMD GPU bug, the flag PADDLE_WITH_ROCM is invalid * [HIP] fix op not support AMD GPU bug, the flag PADDLE_WITH_ROCM is invalid * [HIP] fix op not support AMD GPU bug * [hybrid] seed and dropout op support force-cpu * [hybrid] seed and dropout op support force-cpu * [hybrid] seed and dropout op support force-cpu * [hybrid] seed and dropout op support force-cpu * [hybrid] seed and dropout op support force-cpu * [hybrid] fix seed ci failed issue * add AsExtra for force_cpu of seed op

* ps gpu dump * remove log

* Add paddle.linalg.eig op * remove comments * remove comments * extend batch_size to the origin * add real times complex functor & destroy the backward complex output bug * terminate output diff when input real tensors * correct tiny doc errors * move functions from eig_helper to svd_helper and remove eig_helper * remove tensor.Resize * remove no longer used code * use existing lapack functions * reply review comments 21/27 * remove .cu as this op is only executed on CPU * remove const_cast & add const in argument list for read-only references * fix sample code error in CI * remove template typename Tbase and more * remove eig exposure in paddle.* * add 'name=None' in eig python implementation * handle the unittest * try to solve the unittest * solve CI coverage * remove no longer used code * polish API doc and more * reply review comments * polish unittest, commit plan B * polish unittest

Add sparse_attention OPs, python api will be added in next pr

* add roi_align in vision/ops.py

paddle-bot-old · 2021-10-25T11:53:47Z

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

sneaxiy and others added 30 commits September 18, 2021 15:55

Correct the return type of elementwise kernel to avoid many compiling…

71f051f

… warnings. (#35839)

trt engine dtor when the last predictor dtor. (#35842)

8a239ae

make whl-build task only reuse local third_party cache (#35858)

a1b6ae2

FixEighOP; Unified MatrixEighFunctor function (#35812)

da44136

fix bug of module 'paddle' has no attribute 'distributed' for python3…

d4cd259

….6 (#35848) * fix bug

Basic PR on Cost Model (#35774)

5ba9fe6

Add basic Cost Model, it uses executor to run program and profile it to get op time. This is an early basic version, we will add more functions in the future.

[hybird] fix pipeline section program Parameter (#35847)

67c6363

increase test_imperative_auto_mixed_precision timePROPERTIES TIMEOUT (#…

e761751

…35863)

add hard_sigmoid trt converter test cases (#35876)

9f88d32

support fp16 (#35888)

087c23a

add dilation check for conv (#35838)

7713430

fix bug of module 'paddle' has no attribute 'fluid' for python3.6 (#3…

12ab017

…5862)

disable tests for fft on windows with gpu (#35872)

5af6081

[2.2]support extern third_party lapack API on Linux/Windows/Mac (#35690)

ae65257

* support extern third_party lapack on Linux/Windows/Mac * fix ci

Modify H2D and D2H as kQueue::Sync and Polish Schedule logic (#35866)

fe35496

* Modify H2D and D2H as kQueue::Sync * fix interface error

refine gc for new_executor (#35764)

fab1a02

* refine gc for new_executor, test=develop * refine, test=develop * refine, test=develop * merge, test=develop

add timeline(recordevent) for new executor, test=develop (#35831)

5574c8c

fix feed for new executor (#35803)

4c2a06d

* fix feed, test=develop * delete one test case, test=develop

Add quant2 int8 lstm model test (#35887)

be4d002

fix: delete_quant_dequant_filter_op_pass, delete_quant_dequant_op_pass (

5cda6b2

#35879)

[Inference] Support NNAdapter and ascend310 (#35226)

10e5304

refine FLAGS approval (#35904)

7ba6924

add no need buffer check, test=develop (#35790)

7ebbcbb

update paddle2onnx version to 0.8.2 in unittest_py/requirements.txt (#…

00e0e35

…35837)

Det &Slogdet (#34992)

9ce45dd

Add new API : paddle.linalg.det & paddle.linalg.slogdet API Alias：paddle.det& paddle.slogdet

levi131 and others added 25 commits September 27, 2021 14:04

Add functional autograd API: jacobian (#35917)

ec2f68e

* init functional jacobian api * finish test with dtype float32 * add float64 test case * polish code * use atol=1e-5 with dtype float64 * fix for ci * set timeout for test_jacobian * polish API docstring * modify docstring

Added flatten and flatten2 BF16/FP32 FWD/BWD kernels (#35892)

e427a0f

* refactored reshape multiop kernel and added flatten1/2 kernels * added formatting for flatten tests * CI fix * disabled reshape_kernel ops after succesful CI run * minor fix

fix zero tensor for unique, unstack (#36021)

efd3538

* fix extra op for expand, expand_as, tile, unstack * fix unique unstack dim 0 * Update expand_v2_op.cc * fix unique_op format

gloo hdfs set check & gloo connect retry (#35750)

ae382d1

* gloo hdfs set check & gloo connect retry * add vlog * print gloo connect addr & add vlog * . * modify vlof * modify vlog * modify vlog

dlpack fix (#35817)

74ff59c

Add Basic CINN Runner Class (#35978)

6f18b04

* Add Basic CINN Runner Class * Add CinnCacheKey * Add Cache logic and improve CinnCacheKey * Modify as reviewer commented * Implement hash_combine to fix MAC build.

rename scale loss grad (#36162)

ad12814

fix bug of reduce_sum when src_dtype != dst_dtype and reduce_num == 1 (…

d5268a6

…#36123)

[hybrid] optimizer sharding support optimize cast (#35878)

eef0a94

reduce calls to SizeOfType (#36110)

c719add

[re-submit] auto read all public envs from flags_map in paddle_gtest_…

53f9768

…main (#36121) * read envs in flags_map * add flags to undefok

py2 to py3 bug and iface fix for pslib (#36102)

0e07f20

【Bug fix】Fix dygraph double grad dtype error (#36125)

af4f018

* fix dygraph double grad dtype error when calling for high differential senario * reinvoke ci * add test for partial_engine.cc

remove new linalg api in paddle.__init__ (#36151)

3bb4715

remove recent linalg api in paddle.init; add args 'name' in some new linalg api interface same change in develop branch to #36112

[HeterPs]ps gpu dump (#36157)

97d3060

* ps gpu dump * remove log

[ROCM] bugfix for arg_min_max (#36098)

36791fd

Add sparse_attention api, test=develop (#35676)

6b587e9

Add sparse_attention OPs, python api will be added in next pr

add roi_align (#35102)

f068e08

* add roi_align in vision/ops.py

fix bugs in HybridParallelClipGrad of hybrid_parallel_optimizer

1c6938f

update

3865e93

update

edc1988

haohongxiang closed this Oct 25, 2021

haohongxiang deleted the paddle2.2_cherrypick branch October 25, 2021 11:55

haohongxiang restored the paddle2.2_cherrypick branch October 25, 2021 12:05

haohongxiang deleted the paddle2.2_cherrypick branch October 25, 2021 12:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[cherry-pick] Support fp16 in HybridParallel and fix bugs in HybridOptimizer #36703

[cherry-pick] Support fp16 in HybridParallel and fix bugs in HybridOptimizer #36703

haohongxiang commented Oct 25, 2021

paddle-bot-old bot commented Oct 25, 2021

[cherry-pick] Support fp16 in HybridParallel and fix bugs in HybridOptimizer #36703

[cherry-pick] Support fp16 in HybridParallel and fix bugs in HybridOptimizer #36703

Conversation

haohongxiang commented Oct 25, 2021

PR types

PR changes

Describe

paddle-bot-old bot commented Oct 25, 2021