Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cherry-pick] Support fp16 in HybridParallel and fix bugs in HybridOptimizer #36703

Closed
wants to merge 110 commits into from
Closed

[cherry-pick] Support fp16 in HybridParallel and fix bugs in HybridOptimizer #36703

wants to merge 110 commits into from

Conversation

haohongxiang
Copy link
Contributor

PR types

Others

PR changes

Others

Describe

[cherrypick] Support fp16 in HybridParallel and fix bugs in HybridOptimizer

sneaxiy and others added 30 commits September 18, 2021 15:55
…to monitor FLAGS changing (#35849)

* change __init__.py to adapt new FLAGS

* test ci check, ready for revert

* split __init__.py and FLAGS approval

* Revert "test ci check, ready for revert"

This reverts commit bbbd244.
Add basic Cost Model, it uses executor to run program and profile it to get op time.

This is an early basic version, we will add more functions in the future.
* Optimization of pool2d grad, first commit.

* remove useless print codes

* refine codes

* refine codes

* seal more operation into template specialization

* fix template struct error in MaxPool2dGrad.

* Fix header including error

* refine code with comment

* Seal the param-preparation codes into function for common use.

* Seal the param-preparation codes into function for common use.

* Seal the param-preparation into funciton and make it common for other kernels

* polish code and erase useless template speicalization

* Rerun triger

* rerun trigger
…35510)

* Create stateful OneDNNAXPYHandler object.

This makes it possible to call it multiple times without recreating the
oneDNN primitives every time.

* Prepare SGDOpKernel to reuse its implementation from OneDNN kernel.

* OneDNN SGD kernel.

* Update call to use new OneDNNAXPYHandler object api.

* Setup seed in proper place.

* Enable OneDNN kernel only for single case.

* For dense param and sparse grad.

* Small refactor.

* Enable oneDNN by op attr or by cmd line flag.

* Use int64_t type for number of elements.

* Support dense param and grad from OneDNN kernel.

* Enable SGD OneDNN kernel when use MP BF16 optimizer.

* Force non-copyable/movable OneDNNAXPYHandler.

* Reuse OneDNNAXPYHandler for spare tensors in SUM op.

* Fix SFINAE rules.

* Remove recording event inside AXPY.

* Get rid of internal primitive caching.

* Stop use PP cache mechanims to store mem and primitive obj.
* Handler obj store and reuse needed desc & prim

* Do not derive from MKLDNNHandlerT
* support extern third_party lapack on Linux/Windows/Mac

* fix ci
* Modify H2D and D2H as kQueue::Sync

* fix interface error
* refine gc for new_executor, test=develop

* refine, test=develop

* refine, test=develop

* merge, test=develop
* fix feed, test=develop

* delete one test case, test=develop
* support nnadapter and ascend310

* modify code

* add anchor_generator convert test

* add gelu convert test

* add conv2d convert test

* modify anchor_operator convert test

* modify conv2d test

* modify con2d convert test

* modify conv2d convert test

* modify conv2d convert test

* modify conv2d test

* fix WITH_PYTHON compile error

* modify test file

* modify test file

* modify test file

* modify test file

* modify test file

* modify test file

* modify test file

* modify test file

Co-authored-by: xiaoxiaohehe001 <[email protected]>
Co-authored-by: jiweibo <[email protected]>
* support ernie-int8 test and prune op attribute test

* remove using and use namespace

* remove macro and use shell instead

* Revert "remove macro and use shell instead"

This reverts commit 615964b.

* fix grammar error

* fix shell error
Add new API : paddle.linalg.det & paddle.linalg.slogdet

API Alias:paddle.det& paddle.slogdet
levi131 and others added 25 commits September 27, 2021 14:04
* init functional jacobian api

* finish test with dtype float32

* add float64 test case

* polish code

* use atol=1e-5 with dtype float64

* fix for ci

* set timeout for test_jacobian

* polish API docstring

* modify docstring
* refactored reshape multiop kernel and added flatten1/2 kernels

* added formatting for flatten tests

* CI fix

* disabled reshape_kernel ops after succesful CI run

* minor fix
* A leap of try for cudaLaunchCooperativeKernel

* fix bugs

* Totally replace the lar cuda kernel

* Fix bugs

* fix code according to comments

* fix codes according to  review comments

* adding some function overload

* relocate the power operation.
* fix extra op for expand, expand_as, tile, unstack

* fix unique unstack dim 0

* Update expand_v2_op.cc

* fix unique_op format
* gloo hdfs set check & gloo connect retry

* add vlog

* print gloo connect addr & add vlog

* .

* modify vlof

* modify vlog

* modify vlog
* Add Basic CINN Runner Class

* Add CinnCacheKey

* Add Cache logic and improve CinnCacheKey


* Modify as reviewer commented

* Implement hash_combine to fix MAC build.
* Initial Commit

* add unittest and add error information

* modify doc

* fix some error

* fix some word

* fix bug cudaDeviceProp* and modify error explanation

* fix cudaDeviceProp* error and unnitest samples

* fix hip error and PADDLE_WITH_HIP

* update style

* fix error is_compiled_with_cuda

* fix paddle.device.cuda.get_device_properties

* fix error for multi thread safe

* update style

* merge conflict

* modify after mentor review

* update style

* delete word

* fix unittest error for windows

* support string input and modify some code

* modify doc to support string input

* fix error for express information

* fix error for express information

* fix unnitest for windows

* fix device.startswith('gpu:')

* format error and doc

* fix after review

* format code

* fix error for doc compile

* fix error for doc compile

* fix error for doc compile

* fix error for doc compile

* fix error for doc compile

* fix py2 error

* fix wrong words and doc

* fix _gpuDeviceProperties
…main (#36121)

* read envs in flags_map

* add flags to undefok
* fix dygraph double grad dtype error when calling for high differential senario

* reinvoke ci

* add test for partial_engine.cc
remove recent linalg api in paddle.init;
add args 'name' in some new linalg api interface
same change in develop branch to #36112
* [HIP] fix op not support AMD GPU bug, the flag PADDLE_WITH_ROCM is invalid

* [HIP] fix op not support AMD GPU bug, the flag PADDLE_WITH_ROCM is invalid

* [HIP] fix op not support AMD GPU bug

* [hybrid] seed and dropout op support force-cpu

* [hybrid] seed and dropout op support force-cpu

* [hybrid] seed and dropout op support force-cpu

* [hybrid] seed and dropout op support force-cpu

* [hybrid] seed and dropout op support force-cpu

* [hybrid] fix seed ci failed issue

* add AsExtra for force_cpu of seed op
* ps gpu dump

* remove log
* Add paddle.linalg.eig op

* remove comments

* remove comments

* extend batch_size to the origin

* add real times complex functor & destroy the backward complex output bug

* terminate output diff when input real tensors

* correct tiny doc errors

* move functions from eig_helper to svd_helper and remove eig_helper

* remove tensor.Resize

* remove no longer used code

* use existing lapack functions

* reply review comments 21/27

* remove .cu as this op is only executed on CPU

* remove const_cast & add const in argument list for read-only references

* fix sample code error in CI

* remove template typename Tbase and more

* remove eig exposure in paddle.*

* add 'name=None' in eig python implementation

* handle the unittest

* try to solve the unittest

* solve CI coverage

* remove no longer used code

* polish API doc and more

* reply review comments

* polish unittest, commit plan B

* polish unittest
Add sparse_attention OPs, python api will be added in next pr
* add roi_align in vision/ops.py
@paddle-bot-old
Copy link

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@haohongxiang haohongxiang deleted the paddle2.2_cherrypick branch October 25, 2021 11:55
@haohongxiang haohongxiang restored the paddle2.2_cherrypick branch October 25, 2021 12:05
@haohongxiang haohongxiang deleted the paddle2.2_cherrypick branch October 25, 2021 12:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.