Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cherry pick] Refine param conversion logic in layer.to #38058

Closed
wants to merge 195 commits into from

Commits on Sep 22, 2021

  1. Configuration menu
    Copy the full SHA
    f72d52e View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    fb8be03 View commit details
    Browse the repository at this point in the history
  3. [cherry-pick]increase test_imperative_auto_mixed_precision time PROP…

    …ERTIES TIMEOUT (PaddlePaddle#35863) (PaddlePaddle#35898)
    
    Increase test_imperative_auto_mixed_precision PROPERTIES TIMEOUT from 120s to 300s.
    zhangbo9674 authored Sep 22, 2021
    Configuration menu
    Copy the full SHA
    1787936 View commit details
    Browse the repository at this point in the history
  4. [cherry-pick] fix bug of module 'paddle' has no attribute 'fluid' fo…

    …r python3.6 (PaddlePaddle#35862) (PaddlePaddle#35900)
    
    fix bug of module paddle has no attribute fluid for python3.6.
    zhangbo9674 authored Sep 22, 2021
    Configuration menu
    Copy the full SHA
    c053520 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    2aaa417 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    bba41e4 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    6cc8b16 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    0f34483 View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    c67cf85 View commit details
    Browse the repository at this point in the history

Commits on Sep 23, 2021

  1. Configuration menu
    Copy the full SHA
    e8e77eb View commit details
    Browse the repository at this point in the history
  2. op:transpose_op supports bool type (PaddlePaddle#35886) (PaddlePaddle…

    …#35926)
    
    * Pass compat of conv_transpose_bias_mkldnn_fuse_pass
    
    * Fix a bug of strided_slice op, about the axes parameter access memory out of bounds
    
    * Fix a bug of transpose op, about accessing memory out of bounds of the perm param
    
    * op:transpose_op supports bool type
    TeslaZhao authored Sep 23, 2021
    Configuration menu
    Copy the full SHA
    95c100c View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    91f25ee View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    4629401 View commit details
    Browse the repository at this point in the history

Commits on Sep 24, 2021

  1. Configuration menu
    Copy the full SHA
    063fca8 View commit details
    Browse the repository at this point in the history
  2. [cherry-pick] fix cusparse compile bug in windows CUDA11.2, test=rele…

    …ase/2.2 (PaddlePaddle#36015)
    
    解决Windows中CUDA11.2编译出错的问题。
    cherry-pick PaddlePaddle#35941
    Liu-xiandong authored Sep 24, 2021
    Configuration menu
    Copy the full SHA
    0e19aeb View commit details
    Browse the repository at this point in the history
  3. [cherry-pick] inference fix trt problem (PaddlePaddle#35939)

    * update xpu version
    jiweibo authored Sep 24, 2021
    Configuration menu
    Copy the full SHA
    ae78940 View commit details
    Browse the repository at this point in the history
  4. Basic PR on Cost Model (PaddlePaddle#35774) (PaddlePaddle#35915)

    Add basic Cost Model, it uses executor to run program and profile it to get op time.
    
    This is an early basic version, we will add more functions in the future.
    zhhsplendid authored Sep 24, 2021
    Configuration menu
    Copy the full SHA
    efcd108 View commit details
    Browse the repository at this point in the history
  5. [cherry-pick] Replace Eigen with Lapack library for eigvals OP kernel (

    …PaddlePaddle#35909) (PaddlePaddle#36038)
    
    This PR implements the kernel of "eigvals" OP with the Lapack library, which has a better performance than the previous Eigen library.
    From00 authored Sep 24, 2021
    Configuration menu
    Copy the full SHA
    e9c0414 View commit details
    Browse the repository at this point in the history

Commits on Sep 25, 2021

  1. Configuration menu
    Copy the full SHA
    33fbdaf View commit details
    Browse the repository at this point in the history

Commits on Sep 26, 2021

  1. Configuration menu
    Copy the full SHA
    085eae2 View commit details
    Browse the repository at this point in the history
  2. [cherry pick]split minimize and add unscale_ for GradScaler (PaddlePa…

    …ddle#35927)
    
    1、Split function GradScaler::minimize() to GradScaler::step() + GradScaler::update()
    2、Add GradScaler::unscale_(optimizer)
    zhangbo9674 authored Sep 26, 2021
    Configuration menu
    Copy the full SHA
    e262125 View commit details
    Browse the repository at this point in the history
  3. fix pad tuple (PaddlePaddle#36043)

    * fix pad tuple
    
    * fix format
    littletomatodonkey authored Sep 26, 2021
    Configuration menu
    Copy the full SHA
    2e473f2 View commit details
    Browse the repository at this point in the history
  4. [NPU] add randperm_op_npu (PaddlePaddle#35763) (PaddlePaddle#36026)

    * add randperm_op_npu
    
    * fix test_set_value_op_npu
    ronny1996 authored Sep 26, 2021
    Configuration menu
    Copy the full SHA
    df81915 View commit details
    Browse the repository at this point in the history
  5. [Cherry-Pick]Add paddle.linalg.solve OP (PaddlePaddle#35715) (Paddle…

    …Paddle#36056)
    
    This PR supports linalg.solve calculation for linear algorithm module of Paddle. One may call paddle.linalg.solve to use it.
    veyron95 authored Sep 26, 2021
    Configuration menu
    Copy the full SHA
    6b4f2fb View commit details
    Browse the repository at this point in the history
  6. [cherry-pick] Add function comments and instructions to the Primitiv…

    …e API PaddlePaddle#36024
    
    [cherry-pick] Add function comments and instructions to the Primitive API
    AnnaTrainingG authored Sep 26, 2021
    Configuration menu
    Copy the full SHA
    05621f7 View commit details
    Browse the repository at this point in the history
  7. [cherry-pick] Add Det and Slogdet API to Release 2.2 (PaddlePaddle#36083

    )
    
    This PR added det and slogdet API to release/2.2
    It is cherry-pick from PaddlePaddle#34992 and PaddlePaddle#36013
    zhhsplendid authored Sep 26, 2021
    Configuration menu
    Copy the full SHA
    ba2a1bb View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    14cdcde View commit details
    Browse the repository at this point in the history
  9. [cherry-pick]CPU forward calculation replaces Eigen with Lapack (Pad…

    …dlePaddle#35916) (PaddlePaddle#36091)
    
    cherry-pick PaddlePaddle#35916,CPU前向计算将Eigen替换为Lapack,修改linalg暴露规则
    Zjq9409 authored Sep 26, 2021
    Configuration menu
    Copy the full SHA
    effb70f View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    bc13ab9 View commit details
    Browse the repository at this point in the history

Commits on Sep 27, 2021

  1. [cherry-pick]Support fixed seed in Python for test (PaddlePaddle#36065

    …) (PaddlePaddle#36094)
    
    When users use gumbel_softmax, they can use paddle.seed() in python for fixed seed.
    YuanRisheng authored Sep 27, 2021
    Configuration menu
    Copy the full SHA
    c3a0eaa View commit details
    Browse the repository at this point in the history
  2. [cherry pick] Modify adam to adamw in Optimizer AdamW (PaddlePaddle#…

    …36028) (PaddlePaddle#36103)
    
    The AdamW optimizer modify the op from adamw to adam in pr35521, this is a inappropriate modify. Modify adam to adamw in AdamW.
    zhangbo9674 authored Sep 27, 2021
    Configuration menu
    Copy the full SHA
    2de7a7f View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    6891134 View commit details
    Browse the repository at this point in the history
  4. [Cherry-pick] Add new func/class API psroi_pool and UT (PaddlePaddle#…

    …36111)
    
    cherry-pick from PaddlePaddle#35352
    
    Add new detection api paddle.vision.ops.psroi_pool and paddle.vision.ops.PSRoIPool
    zoooo0820 authored Sep 27, 2021
    Configuration menu
    Copy the full SHA
    81557da View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    fe5cddf View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    40a2918 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    4bcff7b View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    b171aab View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    5f168af View commit details
    Browse the repository at this point in the history
  10. remove linalg api in paddle.__init__ (PaddlePaddle#36112)

    
    remove recent linalg api in paddle.init;
    add args 'name' in some new linalg api interface
    zhiboniu authored Sep 27, 2021
    Configuration menu
    Copy the full SHA
    a57f081 View commit details
    Browse the repository at this point in the history
  11. Configuration menu
    Copy the full SHA
    45b7627 View commit details
    Browse the repository at this point in the history
  12. Configuration menu
    Copy the full SHA
    1db28fd View commit details
    Browse the repository at this point in the history
  13. cherry-pick PaddlePaddle#36021 fix unique/unstack zero tensor (Paddle…

    …Paddle#36163)
    
    * fix unique unstack dim 0
    
    * fix unique_op format
    bjjwwang authored Sep 27, 2021
    Configuration menu
    Copy the full SHA
    749bc24 View commit details
    Browse the repository at this point in the history
  14. Add paddle.device.cuda.get_device_properties (PaddlePaddle#35875)

    * Initial Commit
    
    * fix py2 error
    
    * fix wrong words and doc
    
    * test=document_fix
    
    * fix _gpuDeviceProperties
    Yanxing-Shi authored Sep 27, 2021
    Configuration menu
    Copy the full SHA
    cea0bc2 View commit details
    Browse the repository at this point in the history

Commits on Sep 28, 2021

  1. Configuration menu
    Copy the full SHA
    c576169 View commit details
    Browse the repository at this point in the history
  2. [cherry-pick] update multi_dot exposure rules (PaddlePaddle#36018) (P…

    …addlePaddle#36131)
    
    根据线性代数库的API暴露规则修改multi_dot的API暴露规则:
    1、在python/paddle/tensor/linalg.py 路径下实现
    2、在python/paddle/linalg.py 下import并加入__all__列表
    3、在python/paddle/tensor/init.py下引入并加入tensor_method_func列表
    4、删除了pythonpaddle/init.py的import
    zkh2016 authored Sep 28, 2021
    Configuration menu
    Copy the full SHA
    632a006 View commit details
    Browse the repository at this point in the history

Commits on Sep 29, 2021

  1. Add roi pool (PaddlePaddle#35084) (PaddlePaddle#36154)

    * add roi pool
    
    * rename input as x
    lyuwenyu authored Sep 29, 2021
    Configuration menu
    Copy the full SHA
    b0289de View commit details
    Browse the repository at this point in the history
  2. [cherry-pick] fix paddle.device.cuda.get_device_properties doc (Paddl…

    …ePaddle#36174)
    
    * test=document_fix
    
    * test=document_fix
    
    * test=document_fix
    
    * test=document_fix
    Yanxing-Shi authored Sep 29, 2021
    Configuration menu
    Copy the full SHA
    dd14f7f View commit details
    Browse the repository at this point in the history
  3. Add op paddle.device.cuda.get_device_name and paddle.device.cuda.get_…

    …device_capability. (PaddlePaddle#36172)
    
    * add get_device_name and get_device_capability
    
    * fix docs
    
    * fix docs
    
    * fix decs
    liyagit21 authored Sep 29, 2021
    Configuration menu
    Copy the full SHA
    96fd98b View commit details
    Browse the repository at this point in the history
  4. add API paddle.linalg.eig (PaddlePaddle#35674) (PaddlePaddle#36188)

    向PaddlePaddle中的线性代数库添加eig算子,该算子计算一般方阵的特征分解。
    cherry-pick 自PaddlePaddle#35674.
    AshburnLee authored Sep 29, 2021
    Configuration menu
    Copy the full SHA
    4e2daa9 View commit details
    Browse the repository at this point in the history

Commits on Sep 30, 2021

  1. [cherry-pick] add roi align (PaddlePaddle#36207)

    add roi align, cherry-pick PaddlePaddle#35102
    nemonameless authored Sep 30, 2021
    Configuration menu
    Copy the full SHA
    dcd17d6 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    87cc8d4 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    e8efba5 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    789012c View commit details
    Browse the repository at this point in the history
  5. Fix raw optim (PaddlePaddle#36176) (PaddlePaddle#36231)

    * fix raw optim
    
    * pre-commit test file
    
    Co-authored-by: sneaxiy <[email protected]>
    
    Co-authored-by: sneaxiy <[email protected]>
    youth123 and sneaxiy authored Sep 30, 2021
    Configuration menu
    Copy the full SHA
    28d1200 View commit details
    Browse the repository at this point in the history
  6. add optest for adamw (PaddlePaddle#36148) (PaddlePaddle#36239)

    * update func name
    
    * skip cpu
    
    * update unittest
    
    * update unittest
    zhaoyinglia authored Sep 30, 2021
    Configuration menu
    Copy the full SHA
    70e6784 View commit details
    Browse the repository at this point in the history

Commits on Oct 11, 2021

  1. [cherry-pick]fix hasattr(paddle.fluid.ir.PassDesc.OP, '__name__') err…

    …or (PaddlePaddle#36294)
    
    对于__getattr__重载后不满足条件的参数,全部抛出AttributeError异常,达到与未重载版本一致。
    
    (cherry picked from PR PaddlePaddle#36229)
    Avin0323 authored Oct 11, 2021
    Configuration menu
    Copy the full SHA
    45de931 View commit details
    Browse the repository at this point in the history
  2. [cherry-pick]C++ support register pass via PassDesc (PaddlePaddle#36302)

    (cherry picked from PR PaddlePaddle#36095)
    
    PR主要功能:支持C++开发注册GeneratePass,简化针对fusion等子图优化场景开发方式。
    Avin0323 authored Oct 11, 2021
    Configuration menu
    Copy the full SHA
    21c65f6 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    31a5829 View commit details
    Browse the repository at this point in the history

Commits on Oct 12, 2021

  1. Configuration menu
    Copy the full SHA
    10eebfa View commit details
    Browse the repository at this point in the history
  2. Fix stop_gradient in RunProgramOp (PaddlePaddle#36339) (PaddlePaddle#…

    …36353)
    
    * Fix stop_gradient in RunProgramOp
    
    * fix reference
    Aurelius84 authored Oct 12, 2021
    Configuration menu
    Copy the full SHA
    a6868c9 View commit details
    Browse the repository at this point in the history

Commits on Oct 13, 2021

  1. Configuration menu
    Copy the full SHA
    ce6a27d View commit details
    Browse the repository at this point in the history
  2. [cherrypick] change paddle.mm api to matmul v2 op (PaddlePaddle#36374)

    * change the paddle.mm to matmul_v2
    
    * update the code for the mm
    
    * update the document for the mm
    wawltor authored Oct 13, 2021
    Configuration menu
    Copy the full SHA
    7a66160 View commit details
    Browse the repository at this point in the history
  3. delete remove_static_file() function in error.py (PaddlePaddle#36153) (

    …PaddlePaddle#36375)
    
    * change time to remove static tempfile
    
    * delete remove_static_file() function
    0x45f authored Oct 13, 2021
    Configuration menu
    Copy the full SHA
    a5767bb View commit details
    Browse the repository at this point in the history

Commits on Oct 14, 2021

  1. Configuration menu
    Copy the full SHA
    976f014 View commit details
    Browse the repository at this point in the history

Commits on Oct 15, 2021

  1. [cherry-pick] add sparse_embedding doc (PaddlePaddle#36312)

    * add sparse_embedding doc
    
    * modify sample code
    
    * fix sample code error
    Yanxing-Shi authored Oct 15, 2021
    Configuration menu
    Copy the full SHA
    fc429fe View commit details
    Browse the repository at this point in the history
  2. [cherry-pick]Verify the correctness of graph rewrited by GeneratePass (

    …PaddlePaddle#36453)
    
    * [WIP]Verify the correctness of graph rewrited by GeneratePass, test=develop
    
    * add delete subgraph and unittest, test=develop
    
    * check simple pass, test=develop
    
    * fix coverage, test=develop
    
    * limit with input_spec via Paddle API, test=develop
    Avin0323 authored Oct 15, 2021
    Configuration menu
    Copy the full SHA
    cc44965 View commit details
    Browse the repository at this point in the history

Commits on Oct 18, 2021

  1. [Cherry-pick][Dy2stat]fix no_grad context error in train mode when us…

    …ing save/load (PaddlePaddle#36434) (PaddlePaddle#36463)
    
    修复使用jit.save/load接口加载模型后,在train模式和no_grad上下文中,显存会一直增长的问题
    0x45f authored Oct 18, 2021
    Configuration menu
    Copy the full SHA
    2b9d192 View commit details
    Browse the repository at this point in the history

Commits on Oct 19, 2021

  1. Add operators for async read & async write (PaddlePaddle#36333) (Padd…

    …lePaddle#36501)
    
    * fix async_read bug
    
    * change index place to cpu
    
    * add tensor size judge
    
    * add async_read & async_write test
    
    * fix bug in async_write
    
    * fix mac py3 ci
    
    * fix bug for cpu version paddle
    
    * fix windows ci bug
    
    * change input argument error type
    
    * change const_cast to mutable_data
    
    * add async_write out-of-bound check and consumate error hint
    
    * fix a small bug for dst_tensor
    
    * add docs and refine codes
    
    * refine docs
    
    * notest,test=windows_ci
    
    * fix windows ci
    
    * fix require
    
    * fix code-block
    
    * add core.is_compiled_with_cuda()
    DesmonDay authored Oct 19, 2021
    Configuration menu
    Copy the full SHA
    d65f8af View commit details
    Browse the repository at this point in the history
  2. quant support matmul_v2 (PaddlePaddle#36469) (PaddlePaddle#36499)

    * quant support matmul_v2
    
    * fix format
    ceci3 authored Oct 19, 2021
    Configuration menu
    Copy the full SHA
    b8167ed View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    d974dbd View commit details
    Browse the repository at this point in the history
  4. [cherry-pick]Add sparse attention cherrypick (PaddlePaddle#36447)

        The code of this PR can only support CUDA 11.2. Currently, CI does not have GPU with CUDA 11.2 , and all tests will be skipped automatically.
    
        The new OP is paddle._C_ops.sparse_attention. Regarding the work of the python API, it will be resolved in a follow-up PR.
    
        The code of this PR lacks tests on dynamic graphs and static graphs, and will be added in subsequent PRs.
    Liu-xiandong authored Oct 19, 2021
    Configuration menu
    Copy the full SHA
    36edb0e View commit details
    Browse the repository at this point in the history

Commits on Oct 20, 2021

  1. catch the generatorfunction and intercept it. (PaddlePaddle#35369) (P…

    …addlePaddle#36536)
    
    * catch the generatorfunction and intercept it.
    
    * add test generator
    
    * add test case
    
    * refine the testcase
    2742195759 authored Oct 20, 2021
    Configuration menu
    Copy the full SHA
    023eb3f View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    b5404f0 View commit details
    Browse the repository at this point in the history

Commits on Oct 21, 2021

  1. remove no_value using var.name (PaddlePaddle#36513) (PaddlePaddle#36565)

    * remove no_value using var.name
    0x45f authored Oct 21, 2021
    Configuration menu
    Copy the full SHA
    6a20205 View commit details
    Browse the repository at this point in the history
  2. improve replicate pad error information (PaddlePaddle#36531)

    * fix replicate pad when input size is 0
    
    * add unit test
    littletomatodonkey authored Oct 21, 2021
    Configuration menu
    Copy the full SHA
    a201a69 View commit details
    Browse the repository at this point in the history
  3. [Cherry-pick] Add functor_primitives.h for kernel primitive api (Pad…

    …dlePaddle#36418)
    
    * Add functor_primitives.h for kernel primtive api
    AnnaTrainingG authored Oct 21, 2021
    Configuration menu
    Copy the full SHA
    3090988 View commit details
    Browse the repository at this point in the history

Commits on Oct 22, 2021

  1. Fix a bug in ReadData, ReadDataBc and ReadDataReduce when NX != 1 (Pa…

    …ddlePaddle#36373) (PaddlePaddle#36616)
    
    * Fix a bug in ReadData, ReadDataBc and ReadDataReduce when NX != 1
    * Update the implement of reduceAnyKernel according to kernel primitive api
    AnnaTrainingG authored Oct 22, 2021
    Configuration menu
    Copy the full SHA
    6840cf5 View commit details
    Browse the repository at this point in the history

Commits on Oct 23, 2021

  1. Add viterbi decode (PaddlePaddle#35778) (PaddlePaddle#36615)

    * add viterbi decode cpu kernel
    
    * add viterbi decoder api in paddle.text
    
    * add a data buffer once to avoid create many small pieces of data buffer frequently
    
    * fix viterbi max_seq_length bug
    
    * fix seq_len=1 bug
    
    * fix device context
    
    * move split out of for loop
    
    * remove INVERSE_SUB
    
    * remove 2 GET_CAST_MASK
    
    * remove 1 loop
    
    * remove Functor
    
    * add to_static deploy code
    
    * use MAX_FUNC instead of ELE_MAX
    
    * add MaxFunctor
    
    * impl max_func
    
    * remove MaxFunctor
    
    * remove cast op
    
    * use REGISTER_OP_WITHOUT_GRADIENT
    
    * add viterbi cuda kernel
    
    * add FIX_BLOCKDIM_CASE macro
    
    * add MKL add, mul; add get data mask
    
    * add arange mkl impl
    
    * add CPU Argmax
    
    * add cpu gather
    
    * use EXECUTE_MKL_ELEMENT_BINARY_OP instead of some ADD, MUL
    
    * use SameDimsBinaryOP instead of EXECUTE_MKL_ELEMENT_BINARY_OP
    
    * use SAME_DIMS_ELEMENT_BINARY_OP
    
    * add SimpleBroadcastBinaryOP
    
    * use int instead of int64_t to accelerate
    
    * optimize SimpleBroadcastBinaryOP
    
    * optimize SimpleBroadcastBinaryOP
    
    * optimize performance in both single thread and multithread situation
    
    * remove useless line
    
    * remove useless code
    
    * add CREATE_TENSOR_BUFFER macro
    
    * add INIT_REQUIRED_TENSOR macro
    
    * add comment
    
    * fix windows ci
    
    * add viterbi unittest
    
    * remove cuda add functor
    
    * remove cuda equal
    
    * remove a template function
    
    * fix windows ci
    
    * fix windows dtype
    
    * remove some template instance
    
    * remove useless header file
    
    * remove some blockdim
    
    * remove transpose impl
    
    * accelerate cpu performance on single thread situation
    
    * viterbi_decode->crf_decode
    
    * rename crf params name
    
    * add viterbi api test
    
    * remove useless import
    
    * add enable_static
    
    * use viterbi decoder
    
    * fix viterbi len=1
    
    * fix  viterbi unittest
    
    * remove useless comments
    
    * reconstruct viterbi decode
    
    * remove ADD,SUB,MUL structure
    
    * fix coverage
    
    * remove CREATE_TENSOR
    
    * add name args
    
    * crf.py->ops.py; with_start_stop_tag->include_start_end_tag
    
    * update crf_decode en docs
    
    * fix viterbi decode en docs
    
    * fix some review comments
    
    * add FIXED_BLOCK_DIM_CASE in cuda
    
    * push_back->emplace_back
    
    * crf_decode->viterbi_decode; include_start_end_tag->include_bos_eos_tag
    
    * paddle.text.ops.viterbi_decode->paddle.text.viterbi_decode
    
    * fix viterbi_decode en docs
    joey12300 authored Oct 23, 2021
    Configuration menu
    Copy the full SHA
    1906c74 View commit details
    Browse the repository at this point in the history

Commits on Oct 25, 2021

  1. Add fused_dropout wrapper to ease use. (PaddlePaddle#36185) (PaddlePa…

    …ddle#36640)
    
    In fused_attention op and fused_ffn op, the fused bias_add+dropout+residual+layernorm kernel or bias_add+dropout+residual kernel is used. To ease the use of this kernel, we provide a wrapper in this PR.
    1.To reuse the increment computing code, we exact the corresponding code to "GetSeedDataAndIncrement" routine in dropout_impl_util.h.
    2.The fused_dropout_helper.h provides the fused dropout kernel wrapper.
    
    Note: the test of this warper will be provided in the following fused_attention_op and fused_ffn PRs.
    limin2021 authored Oct 25, 2021
    Configuration menu
    Copy the full SHA
    05d7e2f View commit details
    Browse the repository at this point in the history
  2. Add fused_attention_op: add impl wrappers. (PaddlePaddle#35903) (Padd…

    …lePaddle#36673)
    
    功能:本PR的目标是提高attention模块的计算性能。
    为了减少框架层对op的调度开销,本PR通过在C++层手动实现attention模块,对外提供attention 大op;
    为了减少防存开销,本PR采取了两种优化方法:
    (1)在q,k,v计算时通过共享输入X,将该处的gemm,transpose和bias add从三次调用减少为一次;
    (2)使用kernel融合优化技术,在不同cuda kernel之间通过寄存器传输数据;
    limin2021 authored Oct 25, 2021
    Configuration menu
    Copy the full SHA
    8c0bacd View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    2bfee7d View commit details
    Browse the repository at this point in the history
  4. [Cherry Pick] refine comments for GradScaler state_dict (PaddlePaddle…

    …#36522) (PaddlePaddle#36671)
    
    Refine comments for GradScaler state_dict.
    zhangbo9674 authored Oct 25, 2021
    Configuration menu
    Copy the full SHA
    304fb2b View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    bd40dd9 View commit details
    Browse the repository at this point in the history
  6. Add nn.functional.sparse_attention and some test cases, test=develop (P…

    …addlePaddle#35757) (PaddlePaddle#36551)
    
    Add paddle.nn.functional.sparse_attention API
    
        本个PR主要将sparse_attention功能在python层进行了一层封装,OP的主体代码见:#PR35676
    
        此外,对于封装的python 接口,增加了相应的单测。
    Liu-xiandong authored Oct 25, 2021
    Configuration menu
    Copy the full SHA
    c57d1e9 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    8ebee86 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    6ecfe80 View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    a9b7d1d View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    7612bf1 View commit details
    Browse the repository at this point in the history
  11. cherry-pick (PaddlePaddle#36653)

    cherry-pick prs
    
    PaddlePaddle#36568
    fix fc fuse compat problem
    
    PaddlePaddle#36610
    support lite xpu choose device id
    
    PaddlePaddle#36010
    update lite branch
    
    PaddlePaddle#36628
    add file exists check
    jiweibo authored Oct 25, 2021
    Configuration menu
    Copy the full SHA
    cb33835 View commit details
    Browse the repository at this point in the history
  12. Configuration menu
    Copy the full SHA
    0951bfd View commit details
    Browse the repository at this point in the history
  13. Configuration menu
    Copy the full SHA
    a540769 View commit details
    Browse the repository at this point in the history
  14. Configuration menu
    Copy the full SHA
    5f1b193 View commit details
    Browse the repository at this point in the history
  15. Configuration menu
    Copy the full SHA
    bdcc2ad View commit details
    Browse the repository at this point in the history
  16. Configuration menu
    Copy the full SHA
    4d3c7f3 View commit details
    Browse the repository at this point in the history
  17. [cherry-pick]Fix grid sampler (PaddlePaddle#36625)

    * Fix grid sampler
    
    * Fix code format
    wanghaoshuang authored Oct 25, 2021
    Configuration menu
    Copy the full SHA
    668db93 View commit details
    Browse the repository at this point in the history
  18. [cherry-pick 2.2] static model parallel dropout support deterministic…

    … RandomSeedGenerator (PaddlePaddle#36682)
    
    * Revert "Add fused_dropout wrapper to ease use. (PaddlePaddle#36185) (PaddlePaddle#36640)"
    
    This reverts commit 05d7e2f.
    
    * [hybrid] seed and dropout op support force-cpu (PaddlePaddle#35820)
    
    * [HIP] fix op not support AMD GPU bug, the flag PADDLE_WITH_ROCM is invalid
    
    * [HIP] fix op not support AMD GPU bug, the flag PADDLE_WITH_ROCM is invalid
    
    * [HIP] fix op not support AMD GPU bug
    
    * [hybrid] seed and dropout op support force-cpu
    
    * [hybrid] seed and dropout op support force-cpu
    
    * [hybrid] seed and dropout op support force-cpu
    
    * [hybrid] seed and dropout op support force-cpu
    
    * [hybrid] seed and dropout op support force-cpu
    
    * [hybrid] fix seed ci failed issue
    
    * add AsExtra for force_cpu of seed op
    
    * Add fused_dropout wrapper to ease use. (PaddlePaddle#36185)
    
    * [hybrid] static model parallel dropout support deterministic RandomSeedGenerator (PaddlePaddle#36228)
    
    Co-authored-by: xiayanming <[email protected]>
    Co-authored-by: Li Min <[email protected]>
    3 people authored Oct 25, 2021
    Configuration menu
    Copy the full SHA
    59615ff View commit details
    Browse the repository at this point in the history

Commits on Oct 26, 2021

  1. Configuration menu
    Copy the full SHA
    37ac0dd View commit details
    Browse the repository at this point in the history
  2. [cherry-pick] Support CPU Parallel in DataParallel Interface by GLOO …

    …to speed up training (PaddlePaddle#35745) (PaddlePaddle#36605)
    
    * User specified backend (PaddlePaddle#35745)
    
    * remove tensordot
    2742195759 authored Oct 26, 2021
    Configuration menu
    Copy the full SHA
    beb920c View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    3fbb664 View commit details
    Browse the repository at this point in the history
  4. [cherry-pick-2.2] Fused attention op forward (PaddlePaddle#35905) (P…

    …addlePaddle#36708)
    
    功能:本PR的目标是提高attention模块的计算性能。
    为了减少框架层对op的调度开销,本PR通过在C++层手动实现attention模块,对外提供attention 大op;
    为了减少防存开销,本PR采取了两种优化方法:
    (1)在q,k,v计算时通过共享输入X,将该处的gemm,transpose和bias add从三次调用减少为一次;
    (2)使用kernel融合优化技术,在不同cuda kernel之间通过寄存器传输数据;
    limin2021 authored Oct 26, 2021
    Configuration menu
    Copy the full SHA
    d2be870 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    32fe5a4 View commit details
    Browse the repository at this point in the history
  6. add slot record support for GpuPS (PaddlePaddle#36723)

    * add slotrecord datafeed (PaddlePaddle#36099)
    
    * fix multi-node (PaddlePaddle#36329)
    yaoxuefeng6 authored Oct 26, 2021
    Configuration menu
    Copy the full SHA
    53480c9 View commit details
    Browse the repository at this point in the history
  7. [Amp] refine code of amp level (PaddlePaddle#36362) (PaddlePaddle#36726)

    * refine amp level
    
    * fix typo
    
    * update tracer._amp_level
    zhiqiu authored Oct 26, 2021
    Configuration menu
    Copy the full SHA
    1ee4fc3 View commit details
    Browse the repository at this point in the history
  8. Support various length support for SelectedRows in GLOO::AllGather (P…

    …addlePaddle#36637) (PaddlePaddle#36722)
    
    Support various length support for SelectedRows in GLOO::AllGather (PaddlePaddle#36637)
    
        In cpu parallel using gloo, add various length support for SelectedRows
    2742195759 authored Oct 26, 2021
    Configuration menu
    Copy the full SHA
    fced11b View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    616ce20 View commit details
    Browse the repository at this point in the history
  10. Add bincount op (PaddlePaddle#36317) (PaddlePaddle#36709)

    * Add bincount op
    
    * upload cpu version
    
    * fix unitest
    
    * fix unittest
    
    * fix unittest
    
    * fix en doc
    
    * add more test
    
    * fix en doc
    
    * add more test case
    
    * fix test
    
    * fix input vailidation
    
    * fix input check
    
    * fix unittest
    
    * fix test
    
    * fix en doc
    
    cherry-pick
    smallv0221 authored Oct 26, 2021
    Configuration menu
    Copy the full SHA
    610a810 View commit details
    Browse the repository at this point in the history
  11. Pool3d 2.0 (PaddlePaddle#36545) (PaddlePaddle#36721)

    feng_shuai authored Oct 26, 2021
    Configuration menu
    Copy the full SHA
    dfda193 View commit details
    Browse the repository at this point in the history
  12. [cherry-pick]add op: fused_feedforward(forward) (PaddlePaddle#36729)

    This is a fusion operator to compute feed forward layer in transformer model architecture.
    zhangkaihuo authored Oct 26, 2021
    Configuration menu
    Copy the full SHA
    77034fc View commit details
    Browse the repository at this point in the history
  13. [cherry-pick]Support FP16 in HybridParallel and Fix bugs in HybridOpt…

    …imizer (PaddlePaddle#36707)
    
    * fix bugs in HybridParallelClipGrad of hybrid_parallel_optimizer (PaddlePaddle#36237)
    
    * fix bugs in HybridParallelClipGrad of hybrid_parallel_optimizer
    
    * update
    
    * update
    
    * fix bugs in mp_layers、pp_layers and HybridParallelClipGrad (PaddlePaddle#36144)
    
    * fix calling bug of HybridParallelClipGrad
    
    * fix bugs of HybridParallelClipGrad
    
    * add unittest of pp with HybridParallelClipGrad
    
    * fix bugs in mp_layers.py
    
    * update
    
    * fix bugs in pp_layers.py
    
    * update
    
    * [HybridParallel]Rebuild code for pipeline (PaddlePaddle#36396)
    
    * add no_sync for parameters sync
    
    * add pipeline for moe
    
    * [HybridParallel]Support fp16 in dygraph hybrid parallel (PaddlePaddle#36420)
    
    * [HybridParallel]Support fp16 in dygraph hybrid parallel
    
    * update
    
    * update
    
    * update for recompute
    
    * add unittest of pp+fp16
    
    * add unittest of recompute+fp16
    
    * update
    
    * modify ut
    
    * modify ut of cond (PaddlePaddle#36475)
    
    * fix bugs of ClipGradByGlobalNorm in HybridParallel (PaddlePaddle#36555)
    
    * fix bugs of ClipGradByGlobalNorm
    
    * add unittests
    
    * add unittests
    
    * [HybridParallel]fix bug of check_inf in fleet_base.py (PaddlePaddle#36651)
    
    * fix bug of check_inf
    
    * fix allreduce
    
    * support ClipGradByGlobalNorm in sharding (PaddlePaddle#36012)
    
    * support ClipGradByGlobalNorm in sharding
    
    * support ClipGradByGlobalNorm in sharding
    
    * test=allcase
    
    * Update test_linalg_cond.py
    
    * Update hybrid_parallel_util.py
    
    * Update hybrid_parallel_util.py
    
    Co-authored-by: ShenLiang <[email protected]>
    Co-authored-by: zhaoyingli <[email protected]>
    3 people authored Oct 26, 2021
    Configuration menu
    Copy the full SHA
    5b357e0 View commit details
    Browse the repository at this point in the history
  14. [cherry pick] add op: fused_feedforward(backward) (PaddlePaddle#36730)

    * add op: fused_feedforward(backward) (PaddlePaddle#35611)
    
    这个PR是fused_feedforward反向的代码
    
    相关kernel实现:fused_dropout_act_bias, fused_residual_dropout_bias, fused_layernorm_residual_dropout_bias
    
    fused_feedforward是一个融合算子,该算子对transformer模型的feed forward层的算子进行融合和封装,使得前端只呈现一个接口,通过融合减少部分访存和kernel launch的时间,以此提升性能。
    
    * Move fused_attention and fused_feedforward functional api path to incubate (PaddlePaddle#36704)
    
    将 PaddlePaddle#35905PaddlePaddle#35843 PR中新增的的python api接口移到incubate目录下。
    zhangkaihuo authored Oct 26, 2021
    Configuration menu
    Copy the full SHA
    76c1bae View commit details
    Browse the repository at this point in the history
  15. [Cherry-pick] Add FasterTokenizer Operator (PaddlePaddle#36716)

    * Add FasterTokenizer Operator (PaddlePaddle#34491)
    
    Add Tokenizer related functionalities for Transformer model in order that the process of training and predicting is consistent.
    
    * support the text string as an input Tensor
    * support the "VOCAB"unordered_map<wstring, int> as an input Tensor to lookup tokens
    * Tokenizer used for BERT. This tokenizer applies an end-to-end, text string to wordpiece tokenization.
    * It first applies basic tokenization, followed by wordpiece tokenization.
    
    * optimize fast tokenizer
    
    * remove const_cast
    
    Co-authored-by: zhoushunjie <[email protected]>
    Co-authored-by: wawltor <[email protected]>
    3 people authored Oct 26, 2021
    Configuration menu
    Copy the full SHA
    edff5b7 View commit details
    Browse the repository at this point in the history
  16. [Paddle-Inference]Add MatmulV2ToMatmul convert Pass, fix (matmul_v2, …

    …matmul, mul) convert pass, fix (matmul, mul) op_teller (PaddlePaddle#36652) (PaddlePaddle#36737)
    Wangzheee authored Oct 26, 2021
    Configuration menu
    Copy the full SHA
    30ce925 View commit details
    Browse the repository at this point in the history
  17. fix wrong trt dim when input dim is 2 (PaddlePaddle#36614) (PaddlePad…

    …dle#36732)
    
    * fix wrong trt dim when input dim is 2
    
    * update leaky_relu and instance_norm converter unit test
    
    * add instance_norm input dim check
    baoachun authored Oct 26, 2021
    Configuration menu
    Copy the full SHA
    da6e514 View commit details
    Browse the repository at this point in the history
  18. Configuration menu
    Copy the full SHA
    211cf20 View commit details
    Browse the repository at this point in the history

Commits on Oct 27, 2021

  1. Add fused attention op backward and python layer. (PaddlePaddle#36498) (

    PaddlePaddle#36752)
    
    功能:本PR的目标是提高attention模块的计算性能。
    为了减少框架层对op的调度开销,本PR通过在C++层手动实现attention模块,对外提供attention 大op;
    为了减少防存开销,本PR采取了两种优化方法:
    (1)在q,k,v计算时通过共享输入X,将该处的gemm,transpose和bias add从三次调用减少为一次;
    (2)使用kernel融合优化技术,在不同cuda kernel之间通过寄存器传输数据;
    limin2021 authored Oct 27, 2021
    Configuration menu
    Copy the full SHA
    64643d5 View commit details
    Browse the repository at this point in the history
  2. fix BatchNorm for fp16 (PaddlePaddle#36376) (PaddlePaddle#36691)

    * fix BatchNorm for fp16
    GuoxiaWang authored Oct 27, 2021
    Configuration menu
    Copy the full SHA
    417b22d View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    3fc24e0 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    9d2e092 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    b080d98 View commit details
    Browse the repository at this point in the history
  6. Modify paddle.static.nn.cond doc (PaddlePaddle#36694) (PaddlePaddle#3…

    …6767)
    
    Update `cond` English document
    zhhsplendid authored Oct 27, 2021
    Configuration menu
    Copy the full SHA
    c542d57 View commit details
    Browse the repository at this point in the history
  7. bugfix: only check backend when mode == Collecive (PaddlePaddle#36758) (

    PaddlePaddle#36772)
    
    * bugfix: only check backend when mode == Collecive
    2742195759 authored Oct 27, 2021
    Configuration menu
    Copy the full SHA
    5402f8e View commit details
    Browse the repository at this point in the history
  8. [cherry-pick]Fused transformer encoder layer and fused feedforward l…

    …ayer PaddlePaddle#36776
    
    本PR是fused_transformer的layer层代码,包含FusedFeedForward的layer层代码和FusedTransformerEncoderLayer的代码。
    zhangkaihuo authored Oct 27, 2021
    Configuration menu
    Copy the full SHA
    e1b5b1d View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    7cb7535 View commit details
    Browse the repository at this point in the history

Commits on Oct 28, 2021

  1. show paddle traceback after last user code traceback (PaddlePaddle#36741

    ) (PaddlePaddle#36765)
    
    show paddle traceback after last user code traceback
    0x45f authored Oct 28, 2021
    Configuration menu
    Copy the full SHA
    96edcea View commit details
    Browse the repository at this point in the history
  2. [Cherry-pick]FFT function enhancements and bugfixes (PaddlePaddle#36537)

    * update fft api path (PaddlePaddle#36219)
    
    * update fft api path
    * add sample code for ihfft2
    
    Co-authored-by: chenfeiyu <[email protected]>
    
    * fix fft axis (PaddlePaddle#36321)
    
    fix: `-1` is used when fft's axis is `0`
    
    * use unified external error message for cufft api (PaddlePaddle#36114)
    
    * fft: modify sample code result (PaddlePaddle#36325)
    
    * dynamic load mkl as a fft backend when it is avaialble and requested (PaddlePaddle#36414)
    
    * add rocm support for fft api (PaddlePaddle#36415)
    
    * move signal apis
    
    * move fft and signal API path (#2)
    
    * move signal apis
    
    * move fft.py and signal.py to paddle/, fix typos
    
    * fix relative imports from fft.py and signal.py
    
    * fix typos in signal.py (#3)
    
    * move signal apis
    
    * move fft.py and signal.py to paddle/, fix typos
    
    * fix relative imports from fft.py and signal.py
    
    * fix typos
    
    * disable Cache when CUFFT_VERSION >= 10200 (#4)
    
    * move signal apis
    
    * move fft.py and signal.py to paddle/, fix typos
    
    * fix relative imports from fft.py and signal.py
    
    * fix typos
    
    * Add LRUCache for fft plans
    
    * add LRUCache for cuff and hipfft (#5)
    
    * move signal apis
    
    * move fft.py and signal.py to paddle/, fix typos
    
    * fix relative imports from fft.py and signal.py
    
    * fix typos
    
    * WIP: add cache
    
    * delete move constructor and operator= for CuFFTHandle and FFTConfig
    
    * remove log from CuFFTHandle and FFTConfig
    
    * add lrucache for fft rocm backend
    
    * disable LRUCache when CUFFT_VERSION >= 10200
    
    * disbale copy and move for hipFFTHandle; format code
    
    Co-authored-by: Xiaoxu Chen <[email protected]>
    
    * remove debug message of cufftHandler
    
    * roll_op: support Tensor as input for shifts (PaddlePaddle#36727)
    
    * fix fftshift/ifftshift on static mode
    
    * update roll_op version
    
    * add more test cases for fftshift/ifftshift
    
    Co-authored-by: zhiboniu <[email protected]>
    Co-authored-by: chenfeiyu <[email protected]>
    Co-authored-by: LJQ❤️ <[email protected]>
    4 people authored Oct 28, 2021
    Configuration menu
    Copy the full SHA
    11b9f5f View commit details
    Browse the repository at this point in the history
  3. Fix fused_attention_op and fused_feedforward_op bug when pre_layer_no…

    …rm is false. (PaddlePaddle#36793) (PaddlePaddle#36816)
    
    * Fix bug when pre_layer_norm is false.
    limin2021 authored Oct 28, 2021
    Configuration menu
    Copy the full SHA
    ae59223 View commit details
    Browse the repository at this point in the history
  4. [Cherry-pick] Enable CTC grad compute on GPU (PaddlePaddle#36780)

    * Revert "Align CTC grad scale same with ESPNet (PaddlePaddle#34729)"
    
    This reverts commit 10f9644.
    
    * ctc grad compute on gpu
    zh794390558 authored Oct 28, 2021
    Configuration menu
    Copy the full SHA
    8ede9e6 View commit details
    Browse the repository at this point in the history
  5. change api to support trt8 in pool3d_op_convert (PaddlePaddle#36783) (P…

    …addlePaddle#36812)
    
    * change api for support trt8
    feng_shuai authored Oct 28, 2021
    Configuration menu
    Copy the full SHA
    5fb2850 View commit details
    Browse the repository at this point in the history
  6. [fix-doc-bug] Fix fused_attention_op english doc test=document_fix (P…

    …addlePaddle#36803) (PaddlePaddle#36829)
    
    * Fix fused_attention english doc test=document_fix
    limin2021 authored Oct 28, 2021
    Configuration menu
    Copy the full SHA
    9a96490 View commit details
    Browse the repository at this point in the history
  7. [cherry-pick 2.2]support quantization of bert (PaddlePaddle#36820)

    * [cherry-pick 2.2]support quantization of bert
    
    support quantization for maumul_v2
    
    * Update quantization_pass.py
    XGZhang11 authored Oct 28, 2021
    Configuration menu
    Copy the full SHA
    f20c5c9 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    7647d40 View commit details
    Browse the repository at this point in the history
  9. Cherry-pick-36556: add paddle.version.cuda and paddle.version.cudnn A…

    …PI (PaddlePaddle#36556) (PaddlePaddle#36795)
    
    * add paddle.version.cuda and paddle.version.cudnn API
    
    * fix little bug
    
    * fix bug
    
    * add doc string
    
    * fix mkdir error
    
    * fix windows path
    
    * fix new paddle/version path
    
    * fix unittest
    
    * fix format
    pangyoki authored Oct 28, 2021
    Configuration menu
    Copy the full SHA
    05b8630 View commit details
    Browse the repository at this point in the history
  10. fix device docs;test=document_fix (PaddlePaddle#36784) (PaddlePaddle#…

    …36827)
    
    * fix device docs;test=document_fix
    
    * update __init__.py
    Ligoml authored Oct 28, 2021
    Configuration menu
    Copy the full SHA
    0b7f43e View commit details
    Browse the repository at this point in the history
  11. Configuration menu
    Copy the full SHA
    e3db65d View commit details
    Browse the repository at this point in the history
  12. Configuration menu
    Copy the full SHA
    d8ffb26 View commit details
    Browse the repository at this point in the history
  13. Configuration menu
    Copy the full SHA
    c716cf3 View commit details
    Browse the repository at this point in the history

Commits on Oct 29, 2021

  1. 1. fix ifftshift(missing negative sign before shifts); (PaddlePaddle#…

    …36835)
    
    2. add complex data type support for paddle.shape at graph assembly.
    Feiyu Chan authored Oct 29, 2021
    Configuration menu
    Copy the full SHA
    fa7aa6b View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    f2daef5 View commit details
    Browse the repository at this point in the history
  3. 1 Configuration menu
    Copy the full SHA
    09bc9c0 View commit details
    Browse the repository at this point in the history

Commits on Nov 1, 2021

  1. Configuration menu
    Copy the full SHA
    dcadc25 View commit details
    Browse the repository at this point in the history
  2. [cherry-pick]fix cusparse compile bug in CUDA11.2, test=release/2.2 (P…

    …addlePaddle#36913)
    
    * fix cusparse compile bug in CUDA11.2, test=develop
    
    * fix bug
    Liu-xiandong authored Nov 1, 2021
    1 Configuration menu
    Copy the full SHA
    ab2004b View commit details
    Browse the repository at this point in the history

Commits on Nov 8, 2021

  1. Configuration menu
    Copy the full SHA
    76cab75 View commit details
    Browse the repository at this point in the history
  2. Optimized the solve op code:renamed var and removed template func (Pa…

    …ddlePaddle#36981) (PaddlePaddle#37011)
    
        Renamed the variable and function
        Removed the original template function
        Removed the tests_properties in CMakeLists.txt
    veyron95 authored Nov 8, 2021
    Configuration menu
    Copy the full SHA
    a787b27 View commit details
    Browse the repository at this point in the history

Commits on Nov 10, 2021

  1. Fix rnn grad bug in cpu when dropout is zero (PaddlePaddle#37080) (Pa…

    …ddlePaddle#37086)
    
    * fix rnn grad bug when num_layers is set 2 and dropout_prob is set 0
    
    * add more test for rnn
    joey12300 authored Nov 10, 2021
    Configuration menu
    Copy the full SHA
    70cb0a5 View commit details
    Browse the repository at this point in the history

Commits on Nov 15, 2021

  1. MLPerf Optimization for Release/2.2 (PaddlePaddle#37109)

    * add mlperf optimization PRs
    
    * update
    sneaxiy authored Nov 15, 2021
    Configuration menu
    Copy the full SHA
    287ca7d View commit details
    Browse the repository at this point in the history

Commits on Nov 16, 2021

  1. Configuration menu
    Copy the full SHA
    dc873eb View commit details
    Browse the repository at this point in the history
  2. fix bug of indexing with ellipsis (PaddlePaddle#37192)

    修复了一维Tensor在使用省略号(...)索引时维度检测异常的问题。
    zyfncg authored Nov 16, 2021
    Configuration menu
    Copy the full SHA
    79b9f47 View commit details
    Browse the repository at this point in the history
  3. [cherry-pick-2.2.1]fix fused_transformer_encoder_layer bug (PaddlePad…

    …dle#37229)
    
    修复了fused_transformer_encoder_layer fine-tune过程发现的一些问题:
    
        fused_attention_op添加attn_mask=None的支持:PR
        pre_layer_norm处理问题:PR
        参数处理,计算错误的问题:PR
        add_bias计算错误问题:PR
        添加pure fp16的支持:PR
    zhangkaihuo authored Nov 16, 2021
    Configuration menu
    Copy the full SHA
    36dd295 View commit details
    Browse the repository at this point in the history

Commits on Nov 17, 2021

  1. Configuration menu
    Copy the full SHA
    8cb370f View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    71b04f6 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    3fdbab2 View commit details
    Browse the repository at this point in the history
  4. [Paddle-Inference] fix_qkv_plugin: fix half scale (PaddlePaddle#37096) (

    PaddlePaddle#37264)
    
    * fix_qkv_plugin: half_scale
    
    * [Paddle-Inference] fix_qkv_plugin: fix half scale
    Wangzheee authored Nov 17, 2021
    Configuration menu
    Copy the full SHA
    027664e View commit details
    Browse the repository at this point in the history

Commits on Nov 19, 2021

  1. [cherry-pick]Add sparse attention doc warning (PaddlePaddle#37189)

    * fix cusparse compile bug in CUDA11.2, test=develop
    
    * modify sparse_attention docs, test=document_fix (PaddlePaddle#36554)
    
    * modify sparse_attention docs, test=develop
    
    * add warning
    
    * add warning ,test=document_fix
    Liu-xiandong authored Nov 19, 2021
    Configuration menu
    Copy the full SHA
    5fd8312 View commit details
    Browse the repository at this point in the history
  2. set net.forward to original forward function in flops (PaddlePaddle#3…

    …6852) (PaddlePaddle#37357)
    
    set net.forward to original forward function in flops when net is a dy2stat model.
    0x45f authored Nov 19, 2021
    Configuration menu
    Copy the full SHA
    b559475 View commit details
    Browse the repository at this point in the history
  3. [Dy2stat]Support for i in [1,2,3] statements in dy2stat (PaddlePadd…

    …le#37259) (PaddlePaddle#37356)
    
    该PR使得动转静模块能够正确转换如下的for i in [1, 2, 3]语句。
    0x45f authored Nov 19, 2021
    Configuration menu
    Copy the full SHA
    44db219 View commit details
    Browse the repository at this point in the history

Commits on Nov 22, 2021

  1. fix bug to support dropout eval grad computing. (PaddlePaddle#37305) (P…

    …addlePaddle#37331)
    
    fix bug to support dropout eval grad computing. cherry-pick PaddlePaddle#37305.
    limin2021 authored Nov 22, 2021
    Configuration menu
    Copy the full SHA
    604b6fc View commit details
    Browse the repository at this point in the history
  2. [cherry-pick] Add paddle.incubate.graph_send_recv API(PaddlePaddle#37205

    ) (PaddlePaddle#37343)
    
    * Add paddle.incubate.graph_send_recv API
    
    * fix bug in CudaAtomicMin and CudaAtomicMax
    
    * add empty line
    DesmonDay authored Nov 22, 2021
    Configuration menu
    Copy the full SHA
    109f8a8 View commit details
    Browse the repository at this point in the history
  3. Fix a bug of quantization (PaddlePaddle#36982) (PaddlePaddle#37381)

    * fix a quantization bug
    
    Co-authored-by: XGZhang <[email protected]>
    ceci3 and XGZhang11 authored Nov 22, 2021
    Configuration menu
    Copy the full SHA
    9ffb43b View commit details
    Browse the repository at this point in the history

Commits on Nov 23, 2021

  1. Configuration menu
    Copy the full SHA
    6b3ffe9 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    0fa96e9 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    2778fcd View commit details
    Browse the repository at this point in the history
  4. [Dy2stat]Allow users to switch eval/train mode when using @to_static …

    …to decorate a function (PaddlePaddle#37383) (PaddlePaddle#37432)
    
    本PR之前使用@to_static装饰一个单独的function时,对于生成的Program无法切换train/eval模式,只能运行在train模式下。这也就导致动转静后用户多次调用function显存会一直增长。
    本PR之后,使用@to_static装饰一个单独的function时,可以通过function.train()或者function.eval()的方式来切换train/eval模式。
    0x45f authored Nov 23, 2021
    Configuration menu
    Copy the full SHA
    eed736d View commit details
    Browse the repository at this point in the history
  5. [Cherry-pick 2.2]Enhance error message of scatter op (PaddlePaddle#37431

    )
    
    * enhance scatter err msg check
    
    * fix ci error
    sneaxiy authored Nov 23, 2021
    Configuration menu
    Copy the full SHA
    d5e73f0 View commit details
    Browse the repository at this point in the history
  6. cherry pick save/load in the_one_ps (PaddlePaddle#37461)

    * save/load in ps runtime(the_one_ps) (PaddlePaddle#36097)
    
    * add trainer desc config to distributed strategy
    
    * code style modified
    
    * data_feed set lod
    
    * fix bug
    
    * code style
    
    * fix bug
    
    * save load
    
    * save load
    
    * save unittest
    
    * add unittest of the_one_ps
    
    * unittest
    
    * add todo in communicator sendsparse
    
    * fix bug in save_inference_model (PaddlePaddle#37362)
    esythan authored Nov 23, 2021
    Configuration menu
    Copy the full SHA
    58a5113 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    436808c View commit details
    Browse the repository at this point in the history
  8. [cherry-pick]Refactor Heterogenous Pipeline Parameter Server (PaddleP…

    …addle#37446)
    
    * bug fix for  DeserializeSelectedRows. test=develop (PaddlePaddle#36520)
    
    * fix SerializeSelectedRows (PaddlePaddle#36543)
    
    * bug fix for  DeserializeSelectedRows. test=develop
    
    * fix bug for SerializeSelectedRows. test=develop
    
    * update. test=develop
    
    * [Heterps]Refactor Heter Pipeline Parameter Server (PaddlePaddle#36845)
    
    * change username
    
    * fix
    
    * fix
    
    * fix
    
    * fix
    
    * fix
    
    * update
    
    * update
    
    * update unittests
    
    * fix
    
    * update
    
    * fix
    
    * update
    
    * fix
    
    * fix
    
    * fix
    
    * update
    
    * update. test=develop
    
    * update. test=develop
    
    * update. test=develop
    
    * update. test=develop
    
    * update. test=develop
    
    * update. test=develop
    
    * update. test=develop
    
    * update. test=develop
    
    * update. test=develop
    
    * update. test=develop
    
    * update. test=develop
    
    * update. test=develop
    
    * update. test=develop
    
    * update. test=develop
    
    * update. test=develop
    
    * update. test=develop
    
    * update. test=develop
    
    * update. test=develop
    
    * update. test=develop
    
    * update. test=develop
    
    * update. test=develop
    
    * update. test=develop
    
    * update. test=develop
    
    * update. test=develop
    
    * update. test=develop
    
    * update. test=develop
    
    * update send_and_recv op. test=develop
    
    * update. test=develop
    
    * update. test=develop
    
    * update. test=develop
    
    * update. test=develop
    
    * update. test=develop
    
    * update. test=develop
    
    * update. test=develop
    
    * update. test=develop
    
    * fix. test=develop
    
    * fix. test=develop
    
    * fix. test=develop
    
    * fix. test=develop
    
    * fix ut. test=develop
    
    * fix unit. notest,test=coverage
    
    * fix ut. notest, test=coverage
    
    * update. notest,test=coverage
    
    * fix ut. notest, test=coverage
    
    * fix ut. notest, test=coverage
    
    * fix. notest, test=coverage
    
    * fix. notest, test=coverage
    
    * fix ut. notest, test=coverage
    
    * fix ut. notest, test=coverage
    
    * fix ut. notest, test=coverage
    
    * fix ut. notest, test=coverage
    
    * add func. notest, test=coverage
    
    * fix ut. notest, test=coverage
    
    * fix. test=develop
    
    * fix. test=develop
    
    * Fix unit test for send_and_recv_cpu & send_and_recv_gpu (PaddlePaddle#37129)
    
    * [heterps]fix ut for heter_pipeline_trainer.cc  (PaddlePaddle#37136)
    
    * fix ut. test=develop
    
    * fix ut. test=develop
    
    * [heterps]bug fix for local training with --heter_worker_num (PaddlePaddle#37166)
    
    * fix. test=develop
    
    * fix. test=develop
    
    * fix. test=develop
    
    * fix. test=develop
    
    * fix. test=develop
    
    * fix ut. test=develop
    
    * fix ut. test=develop
    
    * fix ut. test=develop
    
    * [heterps]Refactor heterogenous worker (PaddlePaddle#37244)
    
    * fix. test=develop
    
    * fix. test=develop
    
    * fix. test=develop
    
    * fix. test=develop
    
    * fix. test=develop
    
    * fix ut. test=develop
    
    * fix ut. test=develop
    
    * fix ut. test=develop
    
    * refactor heter trainer. test=develop
    
    * fix. test=develop
    
    * fix ut. test=develop
    
    * fix ut. test=develop
    
    * fix ut. test=develop
    
    * fix ut. test=develop
    
    * fix ut. test=develop
    
    * fix ut. test=develop
    
    * fix ut. test=develop
    
    * fix ut. test=develop
    
    * fix. test=develop
    
    * fix. test=develop
    
    * fix. test=develop
    
    * fix. test=develop
    
    * fix ut. test=develop
    
    * fix ut. test=develop
    
    * fix ut. test=develop
    
    * [heterps]add heterps mode judgement (PaddlePaddle#37298)
    
    * [heterps]change default executor for heter trainer (PaddlePaddle#37314)
    
    * fix pslib. test=develop
    
    * add device to train_from_dataset. test=develop
    
    * refine fleet.stop_worker. test=develop
    
    * fix ut. test=develop
    
    * fix ut. test=develop
    
    * fix executor & ut. test=develop
    
    * fix executor & ut. test=develop
    
    * fix executor & ut. test=develop
    
    * [heterps]remove api for heter pipeline ps (PaddlePaddle#37396)
    
    * fix api. test=develop
    
    * fix api. test=develop
    
    * fix code style. test=release/2.2
    
    * fix CMakeLists. test=develop (PaddlePaddle#37454)
    zmxdream authored Nov 23, 2021
    Configuration menu
    Copy the full SHA
    4dc426f View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    f873d3a View commit details
    Browse the repository at this point in the history

Commits on Nov 24, 2021

  1. [Cherry pick 2.2] fix bugs to support bias add none for fused_attent…

    …ion op. (PaddlePaddle#37411) (PaddlePaddle#37483)
    
    Add support for bias is none for fused_attention op.
    limin2021 authored Nov 24, 2021
    Configuration menu
    Copy the full SHA
    bed652d View commit details
    Browse the repository at this point in the history

Commits on Nov 25, 2021

  1. Cherry-pick PR 37420, fix inplace bug when the first grad_var(loss_gr…

    …ad) is inplace var (PaddlePaddle#37420) (PaddlePaddle#37488)
    
    fix inplace bug,Cherry pick PR PaddlePaddle#37420
    pangyoki authored Nov 25, 2021
    Configuration menu
    Copy the full SHA
    d31d597 View commit details
    Browse the repository at this point in the history
  2. [cherry-pick-2.2.1]Opt topk (PaddlePaddle#37325)

    目前的fused_attention_op不支持attn_mask=None的输入,本PR对此进行了补充,并补充了相应的单测逻辑。
    zhangkaihuo authored Nov 25, 2021
    Configuration menu
    Copy the full SHA
    89fb196 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    824c4ef View commit details
    Browse the repository at this point in the history
  4. [cherry-pick 2.2]fix data parallel when VOCAB var in program (Paddle…

    …Paddle#37546)
    
    * fix data parallel when VOCAB var in program
    
    * fix ci coverage
    Steffy-zxf authored Nov 25, 2021
    Configuration menu
    Copy the full SHA
    c8429d3 View commit details
    Browse the repository at this point in the history

Commits on Nov 26, 2021

  1. Configuration menu
    Copy the full SHA
    ca8b858 View commit details
    Browse the repository at this point in the history
  2. [cherry-pick 2.2 heterps]bug fix for launch_utils.py (PaddlePaddle#37521

    ) (PaddlePaddle#37570)
    
    * fix. test=develop
    
    * fix. test=develop
    
    * fix. test=develop
    
    * fix. test=develop
    
    * fix. test=develop
    
    * [heterps]bug fix for _run_from_dataset
    
    * fix heter_server.cc
    
    * fix launch_utils.py
    
    * fix heter_section_worker.cc
    
    * fix. test=develop
    
    * fix. test=develop
    zmxdream authored Nov 26, 2021
    Configuration menu
    Copy the full SHA
    4b41b8e View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    3a81805 View commit details
    Browse the repository at this point in the history
  4. fix bug of slice_grad using use_mkldnn attr (PaddlePaddle#37584)

    slice_grad op在选择kernel过程中出现错误,问题原因是在获取use_mkldnn属性时,map中未找到该键值,所以抛出out_of_range异常
    本PR在map获取use_mkldnn属性数据前增加了是否存在该键值的判断逻辑,从而避免出现上述异常
    zyfncg authored Nov 26, 2021
    Configuration menu
    Copy the full SHA
    14fd53d View commit details
    Browse the repository at this point in the history

Commits on Nov 28, 2021

  1. Configuration menu
    Copy the full SHA
    4066713 View commit details
    Browse the repository at this point in the history

Commits on Nov 29, 2021

  1. Fix bugs when bias add none in static graph for fused_attention op. (P…

    …addlePaddle#37566) (PaddlePaddle#37608)
    
    cherry-pick of PR PaddlePaddle#37566:
    
    Based on PaddlePaddle#37411, this PR:
    
        Continue to fix the bugs when bias add is none in static graph for fused_attention op.
        Polish and improve the unittests in test_fused_attention_op_api.py.
    limin2021 authored Nov 29, 2021
    Configuration menu
    Copy the full SHA
    46988e2 View commit details
    Browse the repository at this point in the history
  2. fix pass_desc.proto compilation error, test=develop (PaddlePaddle#37614)

    cherry-pick PaddlePaddle#37536
    
    修复pass_desc.proto在编译时产生依赖问题。
    Avin0323 authored Nov 29, 2021
    Configuration menu
    Copy the full SHA
    7d9c669 View commit details
    Browse the repository at this point in the history
  3. Fix dropout static when axis != None (PaddlePaddle#37223) (PaddlePadd…

    …le#37589)
    
    * fix dropout static when axis != None
    
    * update dropout test
    
    * add dropout test
    
    * fix test
    
    * Update test_dropout_op.py
    
    * Update test_dropout_op.py
    
    * fix testcase
    
    * fix testcase
    
    * Update test_dropout_op.py
    
    * fix testcase
    
    * fix testcase
    
    * optimize perf
    
    * add new test
    
    * fix testcase
    smallv0221 authored Nov 29, 2021
    Configuration menu
    Copy the full SHA
    3a0c550 View commit details
    Browse the repository at this point in the history

Commits on Nov 30, 2021

  1. Configuration menu
    Copy the full SHA
    a5cf2e3 View commit details
    Browse the repository at this point in the history

Commits on Dec 1, 2021

  1. cherry-pick to 2.2 (PaddlePaddle#37238)

    * py2 to py3 bug and iface fix for pslib (PaddlePaddle#36102)
    
    * avoid setting logging.basicConfig (PaddlePaddle#37031)
    kuizhiqing authored Dec 1, 2021
    Configuration menu
    Copy the full SHA
    fe43bee View commit details
    Browse the repository at this point in the history

Commits on Dec 3, 2021

  1. Configuration menu
    Copy the full SHA
    56b1ccb View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    6ece0b1 View commit details
    Browse the repository at this point in the history

Commits on Dec 6, 2021

  1. Configuration menu
    Copy the full SHA
    615b33f View commit details
    Browse the repository at this point in the history

Commits on Dec 7, 2021

  1. Fix cflags D_GLIBCXX_USE_CXX11_ABI takes no effect problem in customi…

    …zed op (PaddlePaddle#37878) (PaddlePaddle#37899)
    
    Fix cflags D_GLIBCXX_USE_CXX11_ABI takes no effect problem in customized op
    Aurelius84 authored Dec 7, 2021
    Configuration menu
    Copy the full SHA
    81be365 View commit details
    Browse the repository at this point in the history
  2. Fix default behavior if block=None in static mode (PaddlePaddle#37827) (

    PaddlePaddle#37898)
    
    Fix default behavior if block=None in static mode (PaddlePaddle#37827)
    Aurelius84 authored Dec 7, 2021
    Configuration menu
    Copy the full SHA
    72a6c14 View commit details
    Browse the repository at this point in the history

Commits on Dec 8, 2021

  1. Configuration menu
    Copy the full SHA
    4114c4a View commit details
    Browse the repository at this point in the history

Commits on Dec 9, 2021

  1. Configuration menu
    Copy the full SHA
    026de65 View commit details
    Browse the repository at this point in the history

Commits on Dec 10, 2021

  1. Configuration menu
    Copy the full SHA
    a4c0c71 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    8b86aad View commit details
    Browse the repository at this point in the history
  3. fix: when ceil_model==true && Padding_algo!=SAME, (x-size)/stride != …

    …int, this convert is wrong (PaddlePaddle#37929) (PaddlePaddle#38033)
    
    Co-authored-by: feng_shuai <[email protected]>
    shangzhizhou and feng_shuai authored Dec 10, 2021
    Configuration menu
    Copy the full SHA
    0e5846c View commit details
    Browse the repository at this point in the history

Commits on Dec 12, 2021

  1. Remove additional warnning in layer.to (PaddlePaddle#36700)

    * remove additional warnning in layer.to
    
    * remove additional warnning in layer.to
    
    * remove additional warnning in layer.to
    
    * remove additional warnning in layer.to
    
    * remove additional warnning in layer.to
    JiabinYang authored and zhangbo9674 committed Dec 12, 2021
    Configuration menu
    Copy the full SHA
    721e78c View commit details
    Browse the repository at this point in the history
  2. Refine param conversion logic in layer.to (PaddlePaddle#36862)

    * refine layer to
    
    * delete comment
    
    * refine logic
    
    * refine code
    
    * refine pure_fp16_init
    
    * refine comment
    zhangbo9674 committed Dec 12, 2021
    Configuration menu
    Copy the full SHA
    c4df875 View commit details
    Browse the repository at this point in the history
  3. 1 Configuration menu
    Copy the full SHA
    7b7e8de View commit details
    Browse the repository at this point in the history