update #3

AnnaTrainingG · 2021-04-19T02:52:37Z

PR types

PR changes

Describe

…ment_fix (#31920)

* Avoid raising warning while import paddle * fix segment fault of set_value * fix code style

* yolobox converter and plugin * yolobox unittest * add dynamic shape restriction * fix git merge log

* fix batchnorm when inpu dims < 3 * add unittest for batchnorm dims = 2

* add deprecated for softmax_with_cross_entropy, test=develop * test for deprecated in english doc, test=develop * test deprecated for softmax_with_cross_entropy in english doc, test=develop * fix readme and English doc for cross_entropy, test=develop * rm test for softmax_with_cross_entropy deprecated, test=develop * update readme for CrossEntropyLoss, test=develop * fix readme format, test=develop * fix readme format, test=develop * fix readme format for cross_entropy, test=develop * add softmax_switch and fix softlabel for cross_entropy, test=develop * 1)recovery softmax_with_cross_entropy in fluid 2) change softmax_switch to use_softmax 3) add example for softlabel for cross_entropy, test=develop * fix Example number for cross_entropy, test=develop * fix code format, test=develop * fix for CI-Coverage, test=develop * fix for CI-Coverage, test=develop * fix ci-coverage for Non-ASCII character '\xe2' in file, test=develop * fix ci-coverage for Non-ASCII character '\xe2' in nn.layer.loss.py, test=develop * update description for doc when use_softmax=Fasle, test=develop * fix some docs and code example for cross_entropy, test=develop * delete redundant description for soft_label parameter of cross_entropy, test=develop * fix some comment for test_cross_entropy_loss.py, test=develop

* Remove old custom OP to reduce whl package volume * [Custom OP]Remove old custom OP to reduce whl package volume

… api (#31744) * support multihead_matmul_fuse_pass_v3 * fix compile problems * embedding_eltwise_ln pass support lookup_table_v2 * suppoort matmul and matmul_v2 in qkv matmul

…ght broadcast (#31960)

* bugfix for warpctc * fix warpctc commit id * fix warpctc commit id * fix warpctc commit id * fix warpctc commit id * fix warpctc commit id * fix WARPCTC_WITH_HIP invalid * Add logs to find out why can not dlopen libwarpctc.so * fix warpctc commit id * fix unit test test_warpctc_op * Optime failed log for dlopen * Optime failed log for dlopen * Delete extra changes * fix warpctc commit id * fix warpctc commit id * Add is_compiled_with_rocm for test_warpctc_op * fix warpctc commit id * Cancel optimize dlopen failed reason, move to next pr, due to it makes windows ci failed * Cancel optimize dlopen failed reason, move to next pr, due to it makes windows ci failed * Cancel optimize dlopen failed reason, move to next pr, due to it makes windows ci failed * fix code style problems

* support minus-int idx to LayerList * update layerlist test

* update cmake minimum version to 3.15, test=develop * fix compilation error on Windows, test=develop * fix compilation error on Windows, test=develop * fix compilation error on Windows, test=develop

* fix split core * format

* fix whl package push pypi * add rst

* update compilation with C++14, test=develop * fix compilation error in eigen, test=develop

* update eigen version to f612df27, test=develop * fix compilation error, test=develop * remove patch command in eigen, test=develop * fix compilation error caused by call Eigen function with float16 and bfloat16, test=develop * fix unittest error, test=develop * fix unittest error caused by precision, test=develop * remove patch files used by old version eigen, test=develop

* polish tensor pipeline. test=develop

* fix one error massage * fix a error message * new fix three error messages * new fix three error messages * new fix some error * new fix one error message

* update, test=develop

* [Parallel UT]improve Parallel UT level on Windows/Linux * [Parallel UT]improve Parallel UT level on Windows/Linux * [Parallel UT]Improve Parallel UT level on Windows/Linux * [Parallel UT]Improve Parallel UT level on Windows/Linux * fix CI

* Initial draft for SGD BG16 kernel. * Unit tests for SGD with BF16 data type. * Add VLOG message to SGD BF16 op CPU kernel. * Enhance error messages and error types. * Refactor SGD op kernels to leverage some common code. * Make easier to add new kerne invoke code. * Fix SGD op kernel for sparse grad. * Unify quotes style. * Fix error for ROCM compilation. * Use specialized PADDLE_ENFORCE_xx functions.

…024 (#32180)

* merge 31065 * Fix typo of selected_npus (#31230) * merge 31249 * [NPU] Support npu op pow and pow grad (#31247) * [NPU] Support npu op: (1) pow (2) pow_grad * Support fp16 * Fix pow npu fp16 test (#31256) * support list of list attribute for NPU (#31299) * support list of list attribute for NPU * fix compile problem * fix reference * [NPU] Support npu op: (1) slice (2) slice_grad (#31275) * fix reading flags from env (#31329) * merge 31347 * [NPU] Support npu op layer_norm and layer_norm_grad (#31310) * init commit, add layer_norm npu kernel * fix typo * add unittest * add unittest * fix bug * fix bug * refine ut * [NPU] add npu kernel for equal op (#31393) * add npu kernel for equal op * refine code * add more ut * update year * [NPU] Support npu kernel for shape op (#31427) * add shape npu * fix * fix * fix endif (#31431) * Fix pow, use fillD instead of broadcast (#31433) * Fix pow, refine code (#31440) * fix cmake of cryptopp to avoid downloading every time (#31451) * [NPU] squeeze and unsqueeze op for ascend (#31452) Co-authored-by: root <[email protected]> * Support npu kernel for gather op (#31458) * add gather npu op * code review done * update python new line * precommit * fix review * del commit * 【NPU】add scale op for npu (#31499) * add scale npu * fix * fix * Support TensorFormVector, TensorToVector of bool type (#31518) * support TensorFormVector, TensorToVector of bool type * add ut * fix compile problem * 【NPU】support npu kernel for fill_constant op (#31521) * add fill_constant npu * add fill_constant npu * fix * cherry-pick 31422, solve conflict * 【NPU】Support npu kernel for matmul op (#31544) * add matmulv2_npu * add matmul * add matmul * [NPU] Support npu op elementwise_mul and elementwise_mul_grad (#31571) * [NPU] Support npu op elementwise_max (#31574) * 【NPU】add relu op for npu (#31515) * add relu npu * fixed * fix * 【NPU】Suppert npu kernel for reshape2 op (#31524) * add reshape2 npu * add reshpe2 * [NPU] Support npu kernel for gather op fix bug (#31541) * add gather npu op * code review done * update python new line * precommit * fix review * del commit * update gather_grad * fix bug * fix bug * [NPU] Support npu kernel for amp_check_finite_and_unscale_npu op (#31457) * Support npu kernel for amp_check_finite_and_unscale_npu op * support EnforceNotMet exception * fix exception bug * modify python unittest * precommit * update c++ unittest * fix review * fix review * [NPU] accuracy op (#31492) * accuracy op * fix license * fix * add test and fix bug * [NPU] add Assign OP (#31561) * add assign op * add test assign npu test * dele if def Co-authored-by: oyjxer <[email protected]> * [NPU] fix npu op elementwise_mul_grad (#31592) * 【NPU】Support npu op gelu and gelu_grad (#31530) * Support npu op gelu and gelu_grad * Support npu op gelu and gelu_grad * [NPU] fix assgin cmake (#31595) * fix gather_grad bug (#31607) * [NPU] add range op (#31560) * add range op * fix codestyle; call GetSize directly Co-authored-by: oyjxer <[email protected]> * 【NPU】Support npu op elementwise_div and elementwise_div_grad (#31573) * Support npu op elementwise_div and elementwise_div_grad * Support npu op elementwise_div and elementwise_div_grad * Support npu op elementwise_div and elementwise_div_grad * [NPU] Support npu op log, log_grad, sqrt, sqrt_grad, square, tanh and tanh_grad (#31600) * [NPU] Support npu op logicalnot_op (#31534) * [NPU] Support npu op elementwise_min (#31575) * [NPU] Support npu op elementwise_pow (#31576) * [NPU] Support npu op table_lookup_v2 and table_lookup_v2_grad (#31399) * [npu] support npu kernel `table_lookup_v2` * clean up * +python test * +cmake * clean up * remove int8 kernel + python unitest for fp16 * clean up * [NPU] support npu kernel for `less_than` (#31327) * [npu] support npu kernel for `less than` * remove int* kernel * cleanup * [NPU] Support npu kernel scatter op (#31624) * Support npu kernel scatter op * Add more test * [NPU] fix allocator min chunk size (#31632) * [NPU] Support NPU kernel cast op (#31635) Co-authored-by: frankwhzhang <[email protected]> * [NPU] add npu kernel for sgd (#31639) * 【NPU】Support NPU kernel for reduce_sum op v2 (#31620) * add reduce_sum * fix broadcastd * fix test * fix * add unsqueeze in reduce_sum * add template * add unittest for keep_dim * test reduce_all Co-authored-by: frankwhzhang <[email protected]> * [NPU] add npu kernel for adam (#31644) * add npu kernel for adam * refine code * disable test * modify atol * 【NPU】Support npu kernel for mul op (#31584) * add mul * add test mul * [NPU] add npu kernel for softmax_with_cross_entropy (#31656) * init * fix bugs * [NPU] add npu kernel for mean Op (#31562) * update mean op * update mean op * give a better test activation Co-authored-by: oyjxer <[email protected]> * Revert "[NPU] add npu kernel for mean Op (#31562)" (#31665) This reverts commit 468ac69. * 【NPU】Add TensorCopy to NPU kernel for reduce_sum op (#31667) * update unittest * add TensorCopy in npu grad kernel * [NPU] Support npu op `expand` (#31405) * [npu] support npu kernel for `expand` * [NPU] fix shape of dx in mul_grad (#31675) * fix shape of dx * refine code * [NPU] add Increment op (#31563) * add increment * fix * update test increment op inplace * update increment op * increment b = 2 Co-authored-by: oyjxer <[email protected]> * [NPU] add NPU add topk (#31596) * add topk op * add cmake * update topk npu op * refactor func * fix test not go npu TopKD bug * NPUPlace(4) to NPUPlace(0) * update comment Co-authored-by: oyjxer <[email protected]> * [NPU] Support NPU kernel sum op (#31671) * [NPU] npu support `transpose` (#31486) * cherry-pick 31564, solve conflict * [NPU] Fix bug: Fix calculation errors of pow grad npu kernel (#31699) * [NPU] Support testing grad of NPU ops in OpTest (#31697) * [NPU] Support NPU kernel of stack op (#31711) * [NPU] Remove redundant ctest of top_k_op_npu_test (#31718) * [NPU] fix reshape npu op kernel (#31726) * rename npu op file * fix reshape * [NPU] change transpose to transpose2 (#31734) * change transpose to transpose2 * fix bug * [NPU] Support mean npu kernel (#31729) * [NPU] fix some bugs of npu op (#31739) * fix softmax * fix mean * fix lookup_table_v2 * 【NPU】Fix npu kernel elementwise_div_grad (#31753) * [NPU] fix the grad kernel diff bug of gather op (#31757) * fix gather grad kernel diff * fix gather grad kernel diff * fix gather review bug * 【NPU】Fix reshape test & add grad test (#31776) * fix * fix * [NPU] support fp16 for npu accuracy op (#31797) * [NPU] support list of tensor input (#31801) * support list of tensor as npu input * add comment * fix typo * fix typo * [NPU] add npu kernel for concat op (#31695) * add npu kernel for concat op * add npu kernel for concat op * refine code * update * refine concat_grad * [NPU] Support npu kernel for op elementwise_floordiv (#31822) * [NPU] fix bug of lookup_table_v2_grad (#31834) * [NPU] support default stream (#31510) * [NPU] support mixed precision input for npu layer norm (#31847) * support mixed precision input for npu layer norm * fix layer_norm npu kernel Co-authored-by: zhiqiu <[email protected]> * 【NPU】Support npu kernel for update_loss_scaling op (#31830) * add update_loss_scaling_npu NPU kernel * change TensorFromVec to Memset * fix compile problem (#31850) * [NPU] support npu for conditional_block op (#31854) * 【NPU】Add int dtype kernel for reshape2 op (#31864) * fix * fix * [NPU] fix some op bugs (#31855) * fix some op bugs * fix some bugs * follow comments * fix log level * add ut * [NPU] support fp16 of input for api pow (#31871) * [NPU] add npu kernel for truncated_gaussian_random op (#31654) * init * add todo * add npu kernel for truncated_gaussian_random * add sync * fix concat_grad * fix typo * fix compile * fix compile * fix compile * fix compile * fix compile * fix compile * fix code style * fix code style * fix code * Fix op test (#32231) * fix conditional block (#32243) * fix style code Co-authored-by: xiayanming <[email protected]> Co-authored-by: Leo Chen <[email protected]> Co-authored-by: liym27 <[email protected]> Co-authored-by: Reventon_L <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: oyjxer <[email protected]> Co-authored-by: yinhaofeng <[email protected]> Co-authored-by: OleNet <[email protected]> Co-authored-by: Meiyim <[email protected]> Co-authored-by: oyxuan-11 <[email protected]> Co-authored-by: pangyoki <[email protected]>

fix test sync_with_cpp (#32212)

* custom python backward * polish up the code * polish up the code * polish up the code. * Fix code format and comments. * Delete redundant files. * add unnittest. * edit unnittest. * edit unnittest. * Remove redundant header files. * Improve coverage and remove redundant code. * support saving for backward. * polish code according to comments. * Add support type for PyLayer. * Modify the DOC. * polish Doc. * polish Doc. * polish Doc. * polish Doc. * polish Doc. * polish Doc. * polish code and make the code robust. * Modify the code format.

* fix two error message * fix two error message * fix error * fix error * fix error * fix error * fix some error message * fix some error * fix error * fix some error * fix some error * fix some error * fix one error * fix some error

* add IsInitialized * rm additional log and add tanh double grad * rename is_initialized

* pscore support heterps * fleet cmake * fleet wrapper * macro * solve conflict * solve conflict * add unitest * paddle enforce * unitest * unitest * unitest

* [ROCM] bugfix for test_conv_transpose_nn_grad * [ROCM] bugfix for test_batch_norm_op_v2 * [ROCM] bugfix for test_empty_like_op * [ROCM] bugfix for test_conv_transpose_nn_grad

* add index_dataset and index_sampler for tree-based model

* make hapi support amp, and add unittest * make unittest only support GPU * update parameters for amp in hapi.Model * update hapi.Model.prepare interface, and update unittest * fix test_model.py unittest bug * add grad clear in dygraph * use_fp16_guard defaults to True, which could avoid nan * add input check, and add internal doc link to low level api * update doc, and decrease the sample num of dataset to avoid timeout * make hapi amp param support str 'O1' or 'O2' * resume calling , modify the code of the check part * upgrade the usage of Fleet API, and disable 'pure_fp16' param

* support ernie trt-int8 for inference * fix reshape

* add model parallel support in dygraph

…32148)

…ten::DenseTensor, test=allcases (PaddlePaddle#38473) * Added shared_ptr<Allocation> member & corresponding interfaces to Storage * Removed original pten::Allocation from Storage and adjusted the interfaces accordingly * Fixed issues with storage offset * Used place to malloc allocation for TensorStorage * [Unify Tensors PR #3]Ported framework::Tensor interfaces to pten::DenseTensor * Fixed issues with place * Added comments * Moved mutable_data with stream argument to DenseTensor * Added set_offset interface * Fixed CI issues,test=allcases * [Unify Tensors PR #4] Port LoDTensor interfaces to DenseTensor * Reverted changes too pten_layout() interface * Removed friend classes

…t=allcases (PaddlePaddle#38632) * Added shared_ptr<Allocation> member & corresponding interfaces to Storage * Removed original pten::Allocation from Storage and adjusted the interfaces accordingly * Fixed issues with storage offset * Used place to malloc allocation for TensorStorage * [Unify Tensors PR #3]Ported framework::Tensor interfaces to pten::DenseTensor * Fixed issues with place * Added comments * Moved mutable_data with stream argument to DenseTensor * Added set_offset interface * Fixed CI issues,test=allcases * [Unify Tensors PR #4] Port LoDTensor interfaces to DenseTensor * Removed friend class EigenTensor/EigenMatrix/EigenVector from Tensor * Modified framework::Tensor to inherit from DenseTensor * Reverted changes too pten_layout() interface * Removed friend classes * Rearranged cfunction calls from tensor.data<void>() to tensor.data() * Fixed CI issues * Fixed lite issues * Fixed data() interface issues,test=allcases * Resolved IsInitialized() issues * Fixed ResetHolder() issues * Fixed MKLDNN & Storage issues * Resolved ShareBufferWith() issues * Fixed LoD issues

…st=allcases (PaddlePaddle#38811) * Added shared_ptr<Allocation> member & corresponding interfaces to Storage * Removed original pten::Allocation from Storage and adjusted the interfaces accordingly * Fixed issues with storage offset * Used place to malloc allocation for TensorStorage * [Unify Tensors PR #3]Ported framework::Tensor interfaces to pten::DenseTensor * Fixed issues with place * Added comments * Moved mutable_data with stream argument to DenseTensor * Added set_offset interface * Fixed CI issues,test=allcases * [Unify Tensors PR #4] Port LoDTensor interfaces to DenseTensor * Removed friend class EigenTensor/EigenMatrix/EigenVector from Tensor * Modified framework::Tensor to inherit from DenseTensor * Reverted changes too pten_layout() interface * Removed friend classes * Rearranged cfunction calls from tensor.data<void>() to tensor.data() * Fixed CI issues * Fixed lite issues * Fixed data() interface issues,test=allcases * Resolved IsInitialized() issues * Fixed ResetHolder() issues * Fixed MKLDNN & Storage issues * Resolved ShareBufferWith() issues * Fixed LoD issues * Removed interfaces & members from lod_tensor,test=allcases

PaddlePaddle#39128) * Added selected_rows and rw_lock to pten * Renamed the unit test target to fix CI * Removed Class SelectedRows in Fluid, changed include/cmake relationship, use pten::SelectedRows in Fluid * Remove rw_lock.h,rw_lock_test.cc in fluid * Use pten::RWLock and pten::AutoRDLock, fix CI * Use pten::SelectedRows * Use pten::SelectedRows * Fix to pass NPU CI * Use pten::SelectedRows, to pass NPU CI * To fix NPU CI * To fix NPU CI again

…Paddle#41051) * [Refactor] refactored eager_gen.py PR #2 * [DoubleGrad PR #1] Decoupled code generation logics for Dygraph ForwardFunctions and GradNodes * Fixed minor issue * Adjusted logics of GenerateNodeCreationCodes and GenerateForwardDefinition * Fixed issues * Supported higher-order grad node generation * [DoubleGrad PR #4] Supported higher-order GradNode generation * Fixed yaml typo

update from PaddlePaddle

Add shell to download several datasets

Avin0323 and others added 30 commits March 30, 2021 08:45

fix op benchmark ci error caused by missing test_pr branch, test=docu…

17030ff

…ment_fix (#31920)

Fix segment Fault from set_value (#31891)

c4b60ef

* Avoid raising warning while import paddle * fix segment fault of set_value * fix code style

[Paddle-TRT] yolobox (#31755)

64ee255

* yolobox converter and plugin * yolobox unittest * add dynamic shape restriction * fix git merge log

fix batchnorm when inpu dims < 3 (#31933)

8084b75

* fix batchnorm when inpu dims < 3 * add unittest for batchnorm dims = 2

add exclusive for test_conv2d_op, test=develop (#31936)

fe28486

[Custom OP]Remove old custom OP and reduce whl package volume (#31813)

04a49b0

* Remove old custom OP to reduce whl package volume * [Custom OP]Remove old custom OP to reduce whl package volume

Enhance cmake to support specifying CUDA_ARCH_NAME to Ampere. (#31923)

e50bc2c

Fix save/load error in imperative qat UT. (#31937)

e1f9316

fix bug when dtype of to_tensor is core.VarType (#31931)

245252b

[Paddle-TRT] TRT inference support for BERT/Transformer in paddle 2.0…

14b7e3c

… api (#31744) * support multihead_matmul_fuse_pass_v3 * fix compile problems * embedding_eltwise_ln pass support lookup_table_v2 * suppoort matmul and matmul_v2 in qkv matmul

Added int8 kernel for oneDNN LSTM op (#31894)

6dca7a1

modify CI recommend information (#31395)

a37a7f6

map_matmul_to_mul_pass support 3dim (#31958)

98e803e

fix a syntax error, test=develop (#31930)

0fa6c8a

[dynamic setitem] Fix bug of dynamic setitem: Decerease axes to do ri…

57d4288

…ght broadcast (#31960)

fix stack op grad nullptr (#31962)

95f808c

support minus-int idx to LayerList (#31750)

5394194

* support minus-int idx to LayerList * update layerlist test

fix some bug in transformer training in xpu (#31918)

52b05ba

update cmake minimum version to 3.15 (#31807)

3a95a0b

* update cmake minimum version to 3.15, test=develop * fix compilation error on Windows, test=develop * fix compilation error on Windows, test=develop * fix compilation error on Windows, test=develop

fix split core (#31892)

393b3bd

* fix split core * format

fix whl package push pypi (#31585)

b09c1ce

* fix whl package push pypi * add rst

update compilation with C++14 (#31815)

587d99a

* update compilation with C++14, test=develop * fix compilation error in eigen, test=develop

Polish tensor pipeline (#31701)

e973bd7

* polish tensor pipeline. test=develop

delete cuda9 code (#31883)

ea738dd

fix one error massage (#31904)

6f85e24

* fix one error massage * fix a error message * new fix three error messages * new fix three error messages * new fix some error * new fix one error message

Adjust pipeline optimizer for 3d parallelism (#31939)

695dd37

* update, test=develop

xingfeng01 and others added 22 commits April 14, 2021 16:33

softmax reconstruction and optimization (#31821)

63abd50

add marco cond for multi function (#32239)

7b9fcac

Added oneDNN reduce_op FWD kernel (#31816)

3a804a0

support the bool tensor and scalar (#32272)

7da4455

Optimize of backward of log_softmax when axis is -1 and dim_size <= 1…

5dc0a6e

…024 (#32180)

Optimize the bec_loss op to avoid copy input back to CPU. (#32265)

69d8027

fix test sync_with_cpp (#32212)

0c037d2

fix test sync_with_cpp (#32212)

Fix some error message (#32169)

f946ba6

* fix two error message * fix two error message * fix error * fix error * fix error * fix error * fix some error message * fix some error * fix error * fix some error * fix some error * fix some error * fix one error * fix some error

【Deepmd Support】add IsInitialized and tanh double grad (#32188)

cfdde0e

* add IsInitialized * rm additional log and add tanh double grad * rename is_initialized

support int for nearest_interp, test=develop (#32270)

668a0d3

heterps support pscore (#32093)

9f8c8f9

* pscore support heterps * fleet cmake * fleet wrapper * macro * solve conflict * solve conflict * add unitest * paddle enforce * unitest * unitest * unitest

[ROCM] bugfix for unit tests (#32258)

90133d2

* [ROCM] bugfix for test_conv_transpose_nn_grad * [ROCM] bugfix for test_batch_norm_op_v2 * [ROCM] bugfix for test_empty_like_op * [ROCM] bugfix for test_conv_transpose_nn_grad

Correct typos (#32288)

825d495

tree-based-model (#31696)

a8c3a90

* add index_dataset and index_sampler for tree-based model

support ernie trt-int8 for inference (#32232)

6da043e

* support ernie trt-int8 for inference * fix reshape

test=develop, fix index_wrapper's cmake depends(#32314)

03c9ecd

[Hybrid Parallel] Add model parallel support in dygraph (#32248)

66d4622

* add model parallel support in dygraph

Unify the implementation of elementwise operation of same dimensions (#…

2c18258

…32148)

AnnaTrainingG merged commit 43f53fe into AnnaTrainingG:develop Apr 19, 2021

AnnaTrainingG pushed a commit that referenced this pull request Jun 9, 2022

Merge pull request #3 from PaddlePaddle/develop

cee2470

update from PaddlePaddle

AnnaTrainingG pushed a commit that referenced this pull request Sep 19, 2022

Merge pull request #3 from Joejiong/joe-gan-dev1

3211114

Add shell to download several datasets

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update #3

update #3

AnnaTrainingG commented Apr 19, 2021

update #3

update #3

Conversation

AnnaTrainingG commented Apr 19, 2021

PR types

PR changes

Describe