[cherry pick] Refine param conversion logic in layer.to #38058

…dle#35881) * cherry-pick 32842

…s/Mac (PaddlePaddle#35897) ATT, cherry-pick PaddlePaddle#35690

…ERTIES TIMEOUT (PaddlePaddle#35863) (PaddlePaddle#35898) Increase test_imperative_auto_mixed_precision PROPERTIES TIMEOUT from 120s to 300s.

…r python3.6 (PaddlePaddle#35862) (PaddlePaddle#35900) fix bug of module paddle has no attribute fluid for python3.6.

…le#35882)

….6 (PaddlePaddle#35848) (PaddlePaddle#35874) * fix bug

…void many compiling warnings. (PaddlePaddle#35839) (PaddlePaddle#35868) Cherry-pick PaddlePaddle#35839

PaddlePaddle#35879) (PaddlePaddle#35907)

) Co-authored-by: joanna.wozna.intel <[email protected]>

…#35926) * Pass compat of conv_transpose_bias_mkldnn_fuse_pass * Fix a bug of strided_slice op, about the axes parameter access memory out of bounds * Fix a bug of transpose op, about accessing memory out of bounds of the perm param * op:transpose_op supports bool type

…addle#35812) (PaddlePaddle#35919) cherry-pick PaddlePaddle#35812，修复Eigh OP

…ase/2.2 (PaddlePaddle#36015) 解决Windows中CUDA11.2编译出错的问题。 cherry-pick PaddlePaddle#35941

* update xpu version

Add basic Cost Model, it uses executor to run program and profile it to get op time. This is an early basic version, we will add more functions in the future.

…PaddlePaddle#35909) (PaddlePaddle#36038) This PR implements the kernel of "eigvals" OP with the Lapack library, which has a better performance than the previous Eigen library.

)

…dle#35934) (cherry picked from commit 347b182)

…ddle#35927) 1、Split function GradScaler::minimize() to GradScaler::step() + GradScaler::update() 2、Add GradScaler::unscale_(optimizer)

* fix pad tuple * fix format

* add randperm_op_npu * fix test_set_value_op_npu

…Paddle#36056) This PR supports linalg.solve calculation for linear algorithm module of Paddle. One may call paddle.linalg.solve to use it.

…e API PaddlePaddle#36024 [cherry-pick] Add function comments and instructions to the Primitive API

) This PR added det and slogdet API to release/2.2 It is cherry-pick from PaddlePaddle#34992 and PaddlePaddle#36013

…dlePaddle#35916) (PaddlePaddle#36091) cherry-pick PaddlePaddle#35916，CPU前向计算将Eigen替换为Lapack，修改linalg暴露规则

…6096)

…) (PaddlePaddle#36094) When users use gumbel_softmax, they can use paddle.seed() in python for fixed seed.

…36028) (PaddlePaddle#36103) The AdamW optimizer modify the op from adamw to adam in pr35521, this is a inappropriate modify. Modify adam to adamw in AdamW.

cherry-pick PaddlePaddle#35858、PaddlePaddle#35895

…36111) cherry-pick from PaddlePaddle#35352 Add new detection api paddle.vision.ops.psroi_pool and paddle.vision.ops.PSRoIPool

…Paddle#36133)

ATT, cherry-pick PaddlePaddle#36098

remove recent linalg api in paddle.init; add args 'name' in some new linalg api interface

…Paddle#36163) * fix unique unstack dim 0 * fix unique_op format

* Initial Commit * fix py2 error * fix wrong words and doc * test=document_fix * fix _gpuDeviceProperties

…e#36160) PaddlePaddle#36161 ATT, cherry-pick PaddlePaddle#36160

…addlePaddle#36131) 根据线性代数库的API暴露规则修改multi_dot的API暴露规则： 1、在python/paddle/tensor/linalg.py 路径下实现 2、在python/paddle/linalg.py 下import并加入__all__列表 3、在python/paddle/tensor/init.py下引入并加入tensor_method_func列表 4、删除了pythonpaddle/init.py的import

* add roi pool * rename input as x

…ePaddle#36174) * test=document_fix * test=document_fix * test=document_fix * test=document_fix

…device_capability. (PaddlePaddle#36172) * add get_device_name and get_device_capability * fix docs * fix docs * fix decs

向PaddlePaddle中的线性代数库添加eig算子，该算子计算一般方阵的特征分解。 cherry-pick 自PaddlePaddle#35674.

add roi align, cherry-pick PaddlePaddle#35102

…PaddlePaddle#36123) (PaddlePaddle#36193)

…#36211) (PaddlePaddle#36233)

* fix raw optim * pre-commit test file Co-authored-by: sneaxiy <[email protected]> Co-authored-by: sneaxiy <[email protected]>

* update func name * skip cpu * update unittest * update unittest

…or (PaddlePaddle#36294) 对于__getattr__重载后不满足条件的参数，全部抛出AttributeError异常，达到与未重载版本一致。 (cherry picked from PR PaddlePaddle#36229)

(cherry picked from PR PaddlePaddle#36095) PR主要功能：支持C++开发注册GeneratePass，简化针对fusion等子图优化场景开发方式。

…36353) * Fix stop_gradient in RunProgramOp * fix reference

* change the paddle.mm to matmul_v2 * update the code for the mm * update the document for the mm

…PaddlePaddle#36375) * change time to remove static tempfile * delete remove_static_file() function

…addlePaddle#36227) (PaddlePaddle#36370) ATT，cherry-pick PaddlePaddle#36227

* add sparse_embedding doc * modify sample code * fix sample code error

…PaddlePaddle#36453) * [WIP]Verify the correctness of graph rewrited by GeneratePass, test=develop * add delete subgraph and unittest, test=develop * check simple pass, test=develop * fix coverage, test=develop * limit with input_spec via Paddle API, test=develop

…ing save/load (PaddlePaddle#36434) (PaddlePaddle#36463) 修复使用jit.save/load接口加载模型后，在train模式和no_grad上下文中，显存会一直增长的问题

…lePaddle#36501) * fix async_read bug * change index place to cpu * add tensor size judge * add async_read & async_write test * fix bug in async_write * fix mac py3 ci * fix bug for cpu version paddle * fix windows ci bug * change input argument error type * change const_cast to mutable_data * add async_write out-of-bound check and consumate error hint * fix a small bug for dst_tensor * add docs and refine codes * refine docs * notest,test=windows_ci * fix windows ci * fix require * fix code-block * add core.is_compiled_with_cuda()

* quant support matmul_v2 * fix format

…ePaddle#36500)

The code of this PR can only support CUDA 11.2. Currently, CI does not have GPU with CUDA 11.2 , and all tests will be skipped automatically. The new OP is paddle._C_ops.sparse_attention. Regarding the work of the python API, it will be resolved in a follow-up PR. The code of this PR lacks tests on dynamic graphs and static graphs, and will be added in subsequent PRs.

…addlePaddle#36536) * catch the generatorfunction and intercept it. * add test generator * add test case * refine the testcase

…#36552)

* remove no_value using var.name

* fix replicate pad when input size is 0 * add unit test

…dlePaddle#36418) * Add functor_primitives.h for kernel primtive api

…ddlePaddle#36373) (PaddlePaddle#36616) * Fix a bug in ReadData, ReadDataBc and ReadDataReduce when NX != 1 * Update the implement of reduceAnyKernel according to kernel primitive api

* add viterbi decode cpu kernel * add viterbi decoder api in paddle.text * add a data buffer once to avoid create many small pieces of data buffer frequently * fix viterbi max_seq_length bug * fix seq_len=1 bug * fix device context * move split out of for loop * remove INVERSE_SUB * remove 2 GET_CAST_MASK * remove 1 loop * remove Functor * add to_static deploy code * use MAX_FUNC instead of ELE_MAX * add MaxFunctor * impl max_func * remove MaxFunctor * remove cast op * use REGISTER_OP_WITHOUT_GRADIENT * add viterbi cuda kernel * add FIX_BLOCKDIM_CASE macro * add MKL add, mul; add get data mask * add arange mkl impl * add CPU Argmax * add cpu gather * use EXECUTE_MKL_ELEMENT_BINARY_OP instead of some ADD, MUL * use SameDimsBinaryOP instead of EXECUTE_MKL_ELEMENT_BINARY_OP * use SAME_DIMS_ELEMENT_BINARY_OP * add SimpleBroadcastBinaryOP * use int instead of int64_t to accelerate * optimize SimpleBroadcastBinaryOP * optimize SimpleBroadcastBinaryOP * optimize performance in both single thread and multithread situation * remove useless line * remove useless code * add CREATE_TENSOR_BUFFER macro * add INIT_REQUIRED_TENSOR macro * add comment * fix windows ci * add viterbi unittest * remove cuda add functor * remove cuda equal * remove a template function * fix windows ci * fix windows dtype * remove some template instance * remove useless header file * remove some blockdim * remove transpose impl * accelerate cpu performance on single thread situation * viterbi_decode->crf_decode * rename crf params name * add viterbi api test * remove useless import * add enable_static * use viterbi decoder * fix viterbi len=1 * fix viterbi unittest * remove useless comments * reconstruct viterbi decode * remove ADD,SUB,MUL structure * fix coverage * remove CREATE_TENSOR * add name args * crf.py->ops.py; with_start_stop_tag->include_start_end_tag * update crf_decode en docs * fix viterbi decode en docs * fix some review comments * add FIXED_BLOCK_DIM_CASE in cuda * push_back->emplace_back * crf_decode->viterbi_decode; include_start_end_tag->include_bos_eos_tag * paddle.text.ops.viterbi_decode->paddle.text.viterbi_decode * fix viterbi_decode en docs

…ddle#36640) In fused_attention op and fused_ffn op, the fused bias_add+dropout+residual+layernorm kernel or bias_add+dropout+residual kernel is used. To ease the use of this kernel, we provide a wrapper in this PR. 1.To reuse the increment computing code, we exact the corresponding code to "GetSeedDataAndIncrement" routine in dropout_impl_util.h. 2.The fused_dropout_helper.h provides the fused dropout kernel wrapper. Note: the test of this warper will be provided in the following fused_attention_op and fused_ffn PRs.

…lePaddle#36673) 功能：本PR的目标是提高attention模块的计算性能。为了减少框架层对op的调度开销，本PR通过在C++层手动实现attention模块，对外提供attention 大op；为了减少防存开销，本PR采取了两种优化方法：（1）在q,k,v计算时通过共享输入X，将该处的gemm，transpose和bias add从三次调用减少为一次；（2）使用kernel融合优化技术，在不同cuda kernel之间通过寄存器传输数据；

…dle#36454) * Add new API tensordot cherry-pick PaddlePaddle#36273

…#36522) (PaddlePaddle#36671) Refine comments for GradScaler state_dict.

…ePaddle#36672) Add fp16 kernel for clip_op.

…addlePaddle#35757) (PaddlePaddle#36551) Add paddle.nn.functional.sparse_attention API 本个PR主要将sparse_attention功能在python层进行了一层封装，OP的主体代码见：#PR35676 此外，对于封装的python 接口，增加了相应的单测。

cherry-pick prs PaddlePaddle#36568 fix fc fuse compat problem PaddlePaddle#36610 support lite xpu choose device id PaddlePaddle#36010 update lite branch PaddlePaddle#36628 add file exists check

…e#36371) (PaddlePaddle#36654)

…e#36549) (PaddlePaddle#36655)

* Fix grid sampler * Fix code format

… RandomSeedGenerator (PaddlePaddle#36682) * Revert "Add fused_dropout wrapper to ease use. (PaddlePaddle#36185) (PaddlePaddle#36640)" This reverts commit 05d7e2f. * [hybrid] seed and dropout op support force-cpu (PaddlePaddle#35820) * [HIP] fix op not support AMD GPU bug, the flag PADDLE_WITH_ROCM is invalid * [HIP] fix op not support AMD GPU bug, the flag PADDLE_WITH_ROCM is invalid * [HIP] fix op not support AMD GPU bug * [hybrid] seed and dropout op support force-cpu * [hybrid] seed and dropout op support force-cpu * [hybrid] seed and dropout op support force-cpu * [hybrid] seed and dropout op support force-cpu * [hybrid] seed and dropout op support force-cpu * [hybrid] fix seed ci failed issue * add AsExtra for force_cpu of seed op * Add fused_dropout wrapper to ease use. (PaddlePaddle#36185) * [hybrid] static model parallel dropout support deterministic RandomSeedGenerator (PaddlePaddle#36228) Co-authored-by: xiayanming <[email protected]> Co-authored-by: Li Min <[email protected]>

…addlePaddle#36446) (PaddlePaddle#36702)

…to speed up training (PaddlePaddle#35745) (PaddlePaddle#36605) * User specified backend (PaddlePaddle#35745) * remove tensordot

…addlePaddle#36708) 功能：本PR的目标是提高attention模块的计算性能。为了减少框架层对op的调度开销，本PR通过在C++层手动实现attention模块，对外提供attention 大op；为了减少防存开销，本PR采取了两种优化方法：（1）在q,k,v计算时通过共享输入X，将该处的gemm，transpose和bias add从三次调用减少为一次；（2）使用kernel融合优化技术，在不同cuda kernel之间通过寄存器传输数据；

* add slotrecord datafeed (PaddlePaddle#36099) * fix multi-node (PaddlePaddle#36329)

* refine amp level * fix typo * update tracer._amp_level

…addlePaddle#36637) (PaddlePaddle#36722) Support various length support for SelectedRows in GLOO::AllGather (PaddlePaddle#36637) In cpu parallel using gloo, add various length support for SelectedRows

* Add bincount op * upload cpu version * fix unitest * fix unittest * fix unittest * fix en doc * add more test * fix en doc * add more test case * fix test * fix input vailidation * fix input check * fix unittest * fix test * fix en doc cherry-pick

This is a fusion operator to compute feed forward layer in transformer model architecture.

…imizer (PaddlePaddle#36707) * fix bugs in HybridParallelClipGrad of hybrid_parallel_optimizer (PaddlePaddle#36237) * fix bugs in HybridParallelClipGrad of hybrid_parallel_optimizer * update * update * fix bugs in mp_layers、pp_layers and HybridParallelClipGrad (PaddlePaddle#36144) * fix calling bug of HybridParallelClipGrad * fix bugs of HybridParallelClipGrad * add unittest of pp with HybridParallelClipGrad * fix bugs in mp_layers.py * update * fix bugs in pp_layers.py * update * [HybridParallel]Rebuild code for pipeline (PaddlePaddle#36396) * add no_sync for parameters sync * add pipeline for moe * [HybridParallel]Support fp16 in dygraph hybrid parallel (PaddlePaddle#36420) * [HybridParallel]Support fp16 in dygraph hybrid parallel * update * update * update for recompute * add unittest of pp+fp16 * add unittest of recompute+fp16 * update * modify ut * modify ut of cond (PaddlePaddle#36475) * fix bugs of ClipGradByGlobalNorm in HybridParallel (PaddlePaddle#36555) * fix bugs of ClipGradByGlobalNorm * add unittests * add unittests * [HybridParallel]fix bug of check_inf in fleet_base.py (PaddlePaddle#36651) * fix bug of check_inf * fix allreduce * support ClipGradByGlobalNorm in sharding (PaddlePaddle#36012) * support ClipGradByGlobalNorm in sharding * support ClipGradByGlobalNorm in sharding * test=allcase * Update test_linalg_cond.py * Update hybrid_parallel_util.py * Update hybrid_parallel_util.py Co-authored-by: ShenLiang <[email protected]> Co-authored-by: zhaoyingli <[email protected]>

* add op: fused_feedforward(backward) (PaddlePaddle#35611) 这个PR是fused_feedforward反向的代码相关kernel实现：fused_dropout_act_bias, fused_residual_dropout_bias, fused_layernorm_residual_dropout_bias fused_feedforward是一个融合算子，该算子对transformer模型的feed forward层的算子进行融合和封装，使得前端只呈现一个接口，通过融合减少部分访存和kernel launch的时间，以此提升性能。 * Move fused_attention and fused_feedforward functional api path to incubate (PaddlePaddle#36704) 将 PaddlePaddle#35905 和 PaddlePaddle#35843 PR中新增的的python api接口移到incubate目录下。

* Add FasterTokenizer Operator (PaddlePaddle#34491) Add Tokenizer related functionalities for Transformer model in order that the process of training and predicting is consistent. * support the text string as an input Tensor * support the "VOCAB"unordered_map<wstring, int> as an input Tensor to lookup tokens * Tokenizer used for BERT. This tokenizer applies an end-to-end, text string to wordpiece tokenization. * It first applies basic tokenization, followed by wordpiece tokenization. * optimize fast tokenizer * remove const_cast Co-authored-by: zhoushunjie <[email protected]> Co-authored-by: wawltor <[email protected]>

…matmul, mul) convert pass, fix (matmul, mul) op_teller (PaddlePaddle#36652) (PaddlePaddle#36737)

…dle#36732) * fix wrong trt dim when input dim is 2 * update leaky_relu and instance_norm converter unit test * add instance_norm input dim check

…Paddle#36696)

PaddlePaddle#36752) 功能：本PR的目标是提高attention模块的计算性能。为了减少框架层对op的调度开销，本PR通过在C++层手动实现attention模块，对外提供attention 大op；为了减少防存开销，本PR采取了两种优化方法：（1）在q,k,v计算时通过共享输入X，将该处的gemm，transpose和bias add从三次调用减少为一次；（2）使用kernel融合优化技术，在不同cuda kernel之间通过寄存器传输数据；

* fix BatchNorm for fp16

…6767) Update `cond` English document

PaddlePaddle#36772) * bugfix: only check backend when mode == Collecive

…ayer PaddlePaddle#36776 本PR是fused_transformer的layer层代码，包含FusedFeedForward的layer层代码和FusedTransformerEncoderLayer的代码。

Co-authored-by: zlsh80826 <[email protected]>

) (PaddlePaddle#36765) show paddle traceback after last user code traceback

* update fft api path (PaddlePaddle#36219) * update fft api path * add sample code for ihfft2 Co-authored-by: chenfeiyu <[email protected]> * fix fft axis (PaddlePaddle#36321) fix: `-1` is used when fft's axis is `0` * use unified external error message for cufft api (PaddlePaddle#36114) * fft: modify sample code result (PaddlePaddle#36325) * dynamic load mkl as a fft backend when it is avaialble and requested (PaddlePaddle#36414) * add rocm support for fft api (PaddlePaddle#36415) * move signal apis * move fft and signal API path (#2) * move signal apis * move fft.py and signal.py to paddle/, fix typos * fix relative imports from fft.py and signal.py * fix typos in signal.py (#3) * move signal apis * move fft.py and signal.py to paddle/, fix typos * fix relative imports from fft.py and signal.py * fix typos * disable Cache when CUFFT_VERSION >= 10200 (#4) * move signal apis * move fft.py and signal.py to paddle/, fix typos * fix relative imports from fft.py and signal.py * fix typos * Add LRUCache for fft plans * add LRUCache for cuff and hipfft (#5) * move signal apis * move fft.py and signal.py to paddle/, fix typos * fix relative imports from fft.py and signal.py * fix typos * WIP: add cache * delete move constructor and operator= for CuFFTHandle and FFTConfig * remove log from CuFFTHandle and FFTConfig * add lrucache for fft rocm backend * disable LRUCache when CUFFT_VERSION >= 10200 * disbale copy and move for hipFFTHandle; format code Co-authored-by: Xiaoxu Chen <[email protected]> * remove debug message of cufftHandler * roll_op: support Tensor as input for shifts (PaddlePaddle#36727) * fix fftshift/ifftshift on static mode * update roll_op version * add more test cases for fftshift/ifftshift Co-authored-by: zhiboniu <[email protected]> Co-authored-by: chenfeiyu <[email protected]> Co-authored-by: LJQ❤️ <[email protected]>

…rm is false. (PaddlePaddle#36793) (PaddlePaddle#36816) * Fix bug when pre_layer_norm is false.

* Revert "Align CTC grad scale same with ESPNet (PaddlePaddle#34729)" This reverts commit 10f9644. * ctc grad compute on gpu

…addlePaddle#36812) * change api for support trt8

…addlePaddle#36803) (PaddlePaddle#36829) * Fix fused_attention english doc test=document_fix

* [cherry-pick 2.2]support quantization of bert support quantization for maumul_v2 * Update quantization_pass.py

…PI (PaddlePaddle#36556) (PaddlePaddle#36795) * add paddle.version.cuda and paddle.version.cudnn API * fix little bug * fix bug * add doc string * fix mkdir error * fix windows path * fix new paddle/version path * fix unittest * fix format

…36827) * fix device docs;test=document_fix * update __init__.py

…ernel (PaddlePaddle#36511) (PaddlePaddle#36808) Cherry-pick PR PaddlePaddle#36511

…addlePaddle#36830)

…36835) 2. add complex data type support for paddle.shape at graph assembly.

) (PaddlePaddle#36860) Cherry-pick PaddlePaddle#36525

…addlePaddle#36913) * fix cusparse compile bug in CUDA11.2, test=develop * fix bug

…dle#37028) att,Fix issue:36902

…ddlePaddle#36981) (PaddlePaddle#37011) Renamed the variable and function Removed the original template function Removed the tests_properties in CMakeLists.txt

…ddlePaddle#37086) * fix rnn grad bug when num_layers is set 2 and dropout_prob is set 0 * add more test for rnn

* add mlperf optimization PRs * update

…Paddle#36356) (PaddlePaddle#37212) Co-authored-by: Pei Yang <[email protected]>

修复了一维Tensor在使用省略号(...)索引时维度检测异常的问题。

…dle#37229) 修复了fused_transformer_encoder_layer fine-tune过程发现的一些问题： fused_attention_op添加attn_mask=None的支持：PR pre_layer_norm处理问题：PR 参数处理，计算错误的问题：PR add_bias计算错误问题：PR 添加pure fp16的支持：PR

PaddlePaddle#37264) * fix_qkv_plugin: half_scale * [Paddle-Inference] fix_qkv_plugin: fix half scale

* fix cusparse compile bug in CUDA11.2, test=develop * modify sparse_attention docs, test=document_fix (PaddlePaddle#36554) * modify sparse_attention docs, test=develop * add warning * add warning ,test=document_fix

…6852) (PaddlePaddle#37357) set net.forward to original forward function in flops when net is a dy2stat model.

…le#37259) (PaddlePaddle#37356) 该PR使得动转静模块能够正确转换如下的for i in [1, 2, 3]语句。

…addlePaddle#37331) fix bug to support dropout eval grad computing. cherry-pick PaddlePaddle#37305.

) (PaddlePaddle#37343) * Add paddle.incubate.graph_send_recv API * fix bug in CudaAtomicMin and CudaAtomicMax * add empty line

* fix a quantization bug Co-authored-by: XGZhang <[email protected]>

…to decorate a function (PaddlePaddle#37383) (PaddlePaddle#37432) 本PR之前使用@to_static装饰一个单独的function时，对于生成的Program无法切换train/eval模式，只能运行在train模式下。这也就导致动转静后用户多次调用function显存会一直增长。本PR之后，使用@to_static装饰一个单独的function时，可以通过function.train()或者function.eval()的方式来切换train/eval模式。

) * enhance scatter err msg check * fix ci error

* save/load in ps runtime(the_one_ps) (PaddlePaddle#36097) * add trainer desc config to distributed strategy * code style modified * data_feed set lod * fix bug * code style * fix bug * save load * save load * save unittest * add unittest of the_one_ps * unittest * add todo in communicator sendsparse * fix bug in save_inference_model (PaddlePaddle#37362)

…addle#37446) * bug fix for DeserializeSelectedRows. test=develop (PaddlePaddle#36520) * fix SerializeSelectedRows (PaddlePaddle#36543) * bug fix for DeserializeSelectedRows. test=develop * fix bug for SerializeSelectedRows. test=develop * update. test=develop * [Heterps]Refactor Heter Pipeline Parameter Server (PaddlePaddle#36845) * change username * fix * fix * fix * fix * fix * update * update * update unittests * fix * update * fix * update * fix * fix * fix * update * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update send_and_recv op. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * update. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix ut. test=develop * fix unit. notest,test=coverage * fix ut. notest, test=coverage * update. notest,test=coverage * fix ut. notest, test=coverage * fix ut. notest, test=coverage * fix. notest, test=coverage * fix. notest, test=coverage * fix ut. notest, test=coverage * fix ut. notest, test=coverage * fix ut. notest, test=coverage * fix ut. notest, test=coverage * add func. notest, test=coverage * fix ut. notest, test=coverage * fix. test=develop * fix. test=develop * Fix unit test for send_and_recv_cpu & send_and_recv_gpu (PaddlePaddle#37129) * [heterps]fix ut for heter_pipeline_trainer.cc (PaddlePaddle#37136) * fix ut. test=develop * fix ut. test=develop * [heterps]bug fix for local training with --heter_worker_num (PaddlePaddle#37166) * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix ut. test=develop * fix ut. test=develop * fix ut. test=develop * [heterps]Refactor heterogenous worker (PaddlePaddle#37244) * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix ut. test=develop * fix ut. test=develop * fix ut. test=develop * refactor heter trainer. test=develop * fix. test=develop * fix ut. test=develop * fix ut. test=develop * fix ut. test=develop * fix ut. test=develop * fix ut. test=develop * fix ut. test=develop * fix ut. test=develop * fix ut. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix ut. test=develop * fix ut. test=develop * fix ut. test=develop * [heterps]add heterps mode judgement (PaddlePaddle#37298) * [heterps]change default executor for heter trainer (PaddlePaddle#37314) * fix pslib. test=develop * add device to train_from_dataset. test=develop * refine fleet.stop_worker. test=develop * fix ut. test=develop * fix ut. test=develop * fix executor & ut. test=develop * fix executor & ut. test=develop * fix executor & ut. test=develop * [heterps]remove api for heter pipeline ps (PaddlePaddle#37396) * fix api. test=develop * fix api. test=develop * fix code style. test=release/2.2 * fix CMakeLists. test=develop (PaddlePaddle#37454)

…ion op. (PaddlePaddle#37411) (PaddlePaddle#37483) Add support for bias is none for fused_attention op.

…ad) is inplace var (PaddlePaddle#37420) (PaddlePaddle#37488) fix inplace bug，Cherry pick PR PaddlePaddle#37420

目前的fused_attention_op不支持attn_mask=None的输入，本PR对此进行了补充，并补充了相应的单测逻辑。

…37530)

…Paddle#37546) * fix data parallel when VOCAB var in program * fix ci coverage

…addlePaddle#37528)

) (PaddlePaddle#37570) * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * fix. test=develop * [heterps]bug fix for _run_from_dataset * fix heter_server.cc * fix launch_utils.py * fix heter_section_worker.cc * fix. test=develop * fix. test=develop

…addlePaddle#37551) cherry-pick PaddlePaddle#36714

slice_grad op在选择kernel过程中出现错误，问题原因是在获取use_mkldnn属性时，map中未找到该键值，所以抛出out_of_range异常本PR在map获取use_mkldnn属性数据前增加了是否存在该键值的判断逻辑，从而避免出现上述异常

…addlePaddle#37566) (PaddlePaddle#37608) cherry-pick of PR PaddlePaddle#37566: Based on PaddlePaddle#37411, this PR: Continue to fix the bugs when bias add is none in static graph for fused_attention op. Polish and improve the unittests in test_fused_attention_op_api.py.

cherry-pick PaddlePaddle#37536 修复pass_desc.proto在编译时产生依赖问题。

…le#37589) * fix dropout static when axis != None * update dropout test * add dropout test * fix test * Update test_dropout_op.py * Update test_dropout_op.py * fix testcase * fix testcase * Update test_dropout_op.py * fix testcase * fix testcase * optimize perf * add new test * fix testcase

* py2 to py3 bug and iface fix for pslib (PaddlePaddle#36102) * avoid setting logging.basicConfig (PaddlePaddle#37031)

PaddlePaddle#37773)

…PaddlePaddle#37274) (PaddlePaddle#37872)

…zed op (PaddlePaddle#37878) (PaddlePaddle#37899) Fix cflags D_GLIBCXX_USE_CXX11_ABI takes no effect problem in customized op

PaddlePaddle#37898) Fix default behavior if block=None in static mode (PaddlePaddle#37827)

…#37912) Polish for zip in dy2stat

…int, this convert is wrong (PaddlePaddle#37929) (PaddlePaddle#38033) Co-authored-by: feng_shuai <[email protected]>

* remove additional warnning in layer.to * remove additional warnning in layer.to * remove additional warnning in layer.to * remove additional warnning in layer.to * remove additional warnning in layer.to

* refine layer to * delete comment * refine logic * refine code * refine pure_fp16_init * refine comment

[cherry pick] Refine param conversion logic in layer.to #38058

[cherry pick] Refine param conversion logic in layer.to #38058

Commits on Sep 22, 2021

Commits on Sep 23, 2021

Commits on Sep 24, 2021

Commits on Sep 25, 2021

Commits on Sep 26, 2021

Commits on Sep 27, 2021

Commits on Sep 28, 2021

Commits on Sep 29, 2021

Commits on Sep 30, 2021

Commits on Oct 11, 2021

Commits on Oct 12, 2021

Commits on Oct 13, 2021

Commits on Oct 14, 2021

Commits on Oct 15, 2021

Commits on Oct 18, 2021

Commits on Oct 19, 2021

Commits on Oct 20, 2021

Commits on Oct 21, 2021

Commits on Oct 22, 2021

Commits on Oct 23, 2021

Commits on Oct 25, 2021

Commits on Oct 26, 2021

Commits on Oct 27, 2021

Commits on Oct 28, 2021

Commits on Oct 29, 2021

Commits on Nov 1, 2021

Commits on Nov 8, 2021

Commits on Nov 10, 2021

Commits on Nov 15, 2021

Commits on Nov 16, 2021

Commits on Nov 17, 2021

Commits on Nov 19, 2021

Commits on Nov 22, 2021

Commits on Nov 23, 2021

Commits on Nov 24, 2021

Commits on Nov 25, 2021

Commits on Nov 26, 2021

Commits on Nov 28, 2021

Commits on Nov 29, 2021

Commits on Nov 30, 2021

Commits on Dec 1, 2021

Commits on Dec 3, 2021

Commits on Dec 6, 2021

Commits on Dec 7, 2021

Commits on Dec 8, 2021

Commits on Dec 9, 2021

Commits on Dec 10, 2021

Commits on Dec 12, 2021