Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cherry pick] Refine param conversion logic in layer.to #38058

Closed
wants to merge 195 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
195 commits
Select commit Hold shift + click to select a range
f72d52e
[cherry-pick] trt engine dtor when the last predictor dtor (#35881)
jiweibo Sep 22, 2021
fb8be03
[cherry-pick2.2]support extern third_party lapack API on Linux/Window…
zhwesky2010 Sep 22, 2021
1787936
[cherry-pick]increase test_imperative_auto_mixed_precision time PROP…
zhangbo9674 Sep 22, 2021
c053520
[cherry-pick] fix bug of module 'paddle' has no attribute 'fluid' fo…
zhangbo9674 Sep 22, 2021
2aaa417
[cherry-pick] [Inference] Support NNAdapter and ascend310 (#35882)
jiweibo Sep 22, 2021
bba41e4
fix bug of module 'paddle' has no attribute 'distributed' for python3…
GuoxiaWang Sep 22, 2021
6cc8b16
add hard_sigmoid trt converter test cases (#35908)
baoachun Sep 22, 2021
0f34483
[Cherry-pick 2.2] Correct the return type of elementwise kernel to a…
Xreki Sep 22, 2021
c67cf85
fix: delete_quant_dequant_filter_op_pass, delete_quant_dequant_op_pas…
Wangzheee Sep 22, 2021
e8e77eb
Add quant2 int8 lstm model test (#35887) (#35912)
lidanqing-intel Sep 23, 2021
95c100c
op:transpose_op supports bool type (#35886) (#35926)
TeslaZhao Sep 23, 2021
91f25ee
add dilation check for conv (#35894)
jerrywgz Sep 23, 2021
4629401
[cherry-pick] FixEighOP; Unified MatrixEighFunctor function (#35812)…
Zjq9409 Sep 23, 2021
063fca8
add pool2d convert test (#35925)
JZZ-NOTE Sep 24, 2021
0e19aeb
[cherry-pick] fix cusparse compile bug in windows CUDA11.2, test=rele…
Liu-xiandong Sep 24, 2021
ae78940
[cherry-pick] inference fix trt problem (#35939)
jiweibo Sep 24, 2021
efcd108
Basic PR on Cost Model (#35774) (#35915)
zhhsplendid Sep 24, 2021
e9c0414
[cherry-pick] Replace Eigen with Lapack library for eigvals OP kernel…
From00 Sep 24, 2021
33fbdaf
temporarily fix the performance drop of recurrent op (#36053)
baoachun Sep 25, 2021
085eae2
[cherry pick] add pass_desc_py_proto depends, test=develop (#35934)
Avin0323 Sep 26, 2021
e262125
[cherry pick]split minimize and add unscale_ for GradScaler (#35927)
zhangbo9674 Sep 26, 2021
2e473f2
fix pad tuple (#36043)
littletomatodonkey Sep 26, 2021
df81915
[NPU] add randperm_op_npu (#35763) (#36026)
ronny1996 Sep 26, 2021
6b4f2fb
[Cherry-Pick]Add paddle.linalg.solve OP (#35715) (#36056)
veyron95 Sep 26, 2021
05621f7
[cherry-pick] Add function comments and instructions to the Primitiv…
AnnaTrainingG Sep 26, 2021
ba2a1bb
[cherry-pick] Add Det and Slogdet API to Release 2.2 (#36083)
zhhsplendid Sep 26, 2021
14cdcde
修改了 示例代码错误 (#36041) (#36089)
wangzhuang1911 Sep 26, 2021
effb70f
[cherry-pick]CPU forward calculation replaces Eigen with Lapack (#35…
Zjq9409 Sep 26, 2021
bc13ab9
concat api support empty tensor. (#35845) (#36096)
wuhuachaocoding Sep 26, 2021
c3a0eaa
[cherry-pick]Support fixed seed in Python for test (#36065) (#36094)
YuanRisheng Sep 27, 2021
2de7a7f
[cherry pick] Modify adam to adamw in Optimizer AdamW (#36028) (#36103)
zhangbo9674 Sep 27, 2021
6891134
[cherry-pick] fix third_party cache bugs (#36048)
betterpig Sep 27, 2021
81557da
[Cherry-pick] Add new func/class API psroi_pool and UT (#36111)
zoooo0820 Sep 27, 2021
fe5cddf
update externalErrorMsg.tar.gz md5 value (#36126) (#36133)
cxxly Sep 27, 2021
40a2918
[ROCM] fixbug for arg_min_max (#36113)
ronny1996 Sep 27, 2021
4bcff7b
Correct the misspelled part of the unit test (#36101)
lijiaqi0612 Sep 27, 2021
b171aab
fix windows ut precession error (#36124)
jiweibo Sep 27, 2021
5f168af
allow user to export parameters defined in model (#36132)
hp03 Sep 27, 2021
a57f081
remove linalg api in paddle.__init__ (#36112)
zhiboniu Sep 27, 2021
45b7627
bugfix reshape -1 (#36143)
JZ-LIANG Sep 27, 2021
1db28fd
fix adamw DeprecationWarining (#35869) (#36025)
zhaoyinglia Sep 27, 2021
749bc24
cherry-pick #36021 fix unique/unstack zero tensor (#36163)
bjjwwang Sep 27, 2021
cea0bc2
Add paddle.device.cuda.get_device_properties (#35875)
Yanxing-Shi Sep 27, 2021
c576169
[cherry-pick] [ROCM] bugfix for bilinear_interp_v2_grad (#36160) #36161
ronny1996 Sep 28, 2021
632a006
[cherry-pick] update multi_dot exposure rules (#36018) (#36131)
zkh2016 Sep 28, 2021
b0289de
Add roi pool (#35084) (#36154)
lyuwenyu Sep 29, 2021
dd14f7f
[cherry-pick] fix paddle.device.cuda.get_device_properties doc (#36174)
Yanxing-Shi Sep 29, 2021
96fd98b
Add op paddle.device.cuda.get_device_name and paddle.device.cuda.get_…
liyagit21 Sep 29, 2021
4e2daa9
add API paddle.linalg.eig (#35674) (#36188)
AshburnLee Sep 29, 2021
dcd17d6
[cherry-pick] add roi align (#36207)
nemonameless Sep 30, 2021
87cc8d4
support fp16 (#35888) (#36191)
GuoxiaWang Sep 30, 2021
e8efba5
fix bug of reduce_sum when src_dtype != dst_dtype and reduce_num == 1…
GuoxiaWang Sep 30, 2021
789012c
fix the undefined variable bug in dist_transformer file (#36211) (#36…
youth123 Sep 30, 2021
28d1200
Fix raw optim (#36176) (#36231)
youth123 Sep 30, 2021
70e6784
add optest for adamw (#36148) (#36239)
zhaoyinglia Sep 30, 2021
45de931
[cherry-pick]fix hasattr(paddle.fluid.ir.PassDesc.OP, '__name__') err…
Avin0323 Oct 11, 2021
21c65f6
[cherry-pick]C++ support register pass via PassDesc (#36302)
Avin0323 Oct 11, 2021
31a5829
dlpack fix (#35817) (#36177)
DesmonDay Oct 11, 2021
10eebfa
fix yolo precision issue(#36365)
b3602sss Oct 12, 2021
a6868c9
Fix stop_gradient in RunProgramOp (#36339) (#36353)
Aurelius84 Oct 12, 2021
ce6a27d
fix for matmul_v2 6D x 2D (#36379)
jakpiase Oct 13, 2021
7a66160
[cherrypick] change paddle.mm api to matmul v2 op (#36374)
wawltor Oct 13, 2021
a5767bb
delete remove_static_file() function in error.py (#36153) (#36375)
0x45f Oct 13, 2021
976f014
fix windows bug that python virtual env can't find python executable …
zhwesky2010 Oct 14, 2021
fc429fe
[cherry-pick] add sparse_embedding doc (#36312)
Yanxing-Shi Oct 15, 2021
cc44965
[cherry-pick]Verify the correctness of graph rewrited by GeneratePass…
Avin0323 Oct 15, 2021
2b9d192
[Cherry-pick][Dy2stat]fix no_grad context error in train mode when us…
0x45f Oct 18, 2021
d65f8af
Add operators for async read & async write (#36333) (#36501)
DesmonDay Oct 19, 2021
b8167ed
quant support matmul_v2 (#36469) (#36499)
ceci3 Oct 19, 2021
d974dbd
cherry-pick 36424 inference support bert when exists matmul_v2 (#36500)
jiweibo Oct 19, 2021
36edb0e
[cherry-pick]Add sparse attention cherrypick (#36447)
Liu-xiandong Oct 19, 2021
023eb3f
catch the generatorfunction and intercept it. (#35369) (#36536)
2742195759 Oct 20, 2021
b5404f0
[cherry-pick] Inference add type check in copy_from_cpu (#36552)
jiweibo Oct 20, 2021
6a20205
remove no_value using var.name (#36513) (#36565)
0x45f Oct 21, 2021
a201a69
improve replicate pad error information (#36531)
littletomatodonkey Oct 21, 2021
3090988
[Cherry-pick] Add functor_primitives.h for kernel primitive api (#36…
AnnaTrainingG Oct 21, 2021
6840cf5
Fix a bug in ReadData, ReadDataBc and ReadDataReduce when NX != 1 (#3…
AnnaTrainingG Oct 22, 2021
1906c74
Add viterbi decode (#35778) (#36615)
joey12300 Oct 23, 2021
05d7e2f
Add fused_dropout wrapper to ease use. (#36185) (#36640)
limin2021 Oct 25, 2021
8c0bacd
Add fused_attention_op: add impl wrappers. (#35903) (#36673)
limin2021 Oct 25, 2021
2bfee7d
[cherry-pick] Add new API 'tensordot' (#36273) (#36454)
From00 Oct 25, 2021
304fb2b
[Cherry Pick] refine comments for GradScaler state_dict (#36522) (#36…
zhangbo9674 Oct 25, 2021
bd40dd9
[Cherry Pick]Add fp16 kernel for clip_op (#36577) (#36672)
zhangbo9674 Oct 25, 2021
c57d1e9
Add nn.functional.sparse_attention and some test cases, test=develop …
Liu-xiandong Oct 25, 2021
8ebee86
Fix conv2d op_teller error (#36474) (#36664)
JZZ-NOTE Oct 25, 2021
6ecfe80
CUDA sparsity support (#36413) (#36659)
JZZ-NOTE Oct 25, 2021
a9b7d1d
Cherrypick (#36666)
b3602sss Oct 25, 2021
7612bf1
Add pool2d test convert (#36338) (#36663)
JZZ-NOTE Oct 25, 2021
cb33835
cherry-pick (#36653)
jiweibo Oct 25, 2021
0951bfd
[cherry-pick] enable trt test check and fix trt ut error (#36371) (#3…
jiweibo Oct 25, 2021
a540769
[cherry-pick] enable trt test check and fix trt ut error (#36549) (#3…
jiweibo Oct 25, 2021
5f1b193
fix nullptr block in op_teller (#36622)
baoachun Oct 25, 2021
bdcc2ad
fix interpolate mkldnn op error (#36662)
baoachun Oct 25, 2021
4d3c7f3
fix cast cuda implementation (#36679)
sneaxiy Oct 25, 2021
668db93
[cherry-pick]Fix grid sampler (#36625)
wanghaoshuang Oct 25, 2021
59615ff
[cherry-pick 2.2] static model parallel dropout support deterministic…
wangxicoding Oct 25, 2021
37ac0dd
feat: Add TRT support for 3D(batch_norm_op and elementwise_add_op) (#…
Oct 26, 2021
beb920c
[cherry-pick] Support CPU Parallel in DataParallel Interface by GLOO …
2742195759 Oct 26, 2021
3fbb664
add slot record dataset (#36200) (#36710)
yaoxuefeng6 Oct 26, 2021
d2be870
[cherry-pick-2.2] Fused attention op forward (#35905) (#36708)
limin2021 Oct 26, 2021
32fe5a4
cherry pick CrossEntropy's bug fix (#36647)
HydrogenSulfate Oct 26, 2021
53480c9
add slot record support for GpuPS (#36723)
yaoxuefeng6 Oct 26, 2021
1ee4fc3
[Amp] refine code of amp level (#36362) (#36726)
zhiqiu Oct 26, 2021
fced11b
Support various length support for SelectedRows in GLOO::AllGather (#…
2742195759 Oct 26, 2021
616ce20
[Cherry-pick] Add the forward QR operator (#36627)
aoyulong Oct 26, 2021
610a810
Add bincount op (#36317) (#36709)
smallv0221 Oct 26, 2021
dfda193
Pool3d 2.0 (#36545) (#36721)
Oct 26, 2021
77034fc
[cherry-pick]add op: fused_feedforward(forward) (#36729)
Oct 26, 2021
5b357e0
[cherry-pick]Support FP16 in HybridParallel and Fix bugs in HybridOpt…
haohongxiang Oct 26, 2021
76c1bae
[cherry pick] add op: fused_feedforward(backward) (#36730)
Oct 26, 2021
edff5b7
[Cherry-pick] Add FasterTokenizer Operator (#36716)
Steffy-zxf Oct 26, 2021
30ce925
[Paddle-Inference]Add MatmulV2ToMatmul convert Pass, fix (matmul_v2, …
Wangzheee Oct 26, 2021
da6e514
fix wrong trt dim when input dim is 2 (#36614) (#36732)
baoachun Oct 26, 2021
211cf20
[cherry-pick] enable trt test check and fix trt ut error(3/3) (#36696)
jiweibo Oct 26, 2021
64643d5
Add fused attention op backward and python layer. (#36498) (#36752)
limin2021 Oct 27, 2021
417b22d
fix BatchNorm for fp16 (#36376) (#36691)
GuoxiaWang Oct 27, 2021
3fc24e0
Fix inverse in fake quant (#36763)
wanghaoshuang Oct 27, 2021
9d2e092
cherrypick for eigvalsh (#36680)
huangjun12 Oct 27, 2021
b080d98
fim matmul dim error (#36768)
baoachun Oct 27, 2021
c542d57
Modify paddle.static.nn.cond doc (#36694) (#36767)
zhhsplendid Oct 27, 2021
5402f8e
bugfix: only check backend when mode == Collecive (#36758) (#36772)
2742195759 Oct 27, 2021
e1b5b1d
[cherry-pick]Fused transformer encoder layer and fused feedforward l…
Oct 27, 2021
7cb7535
fix ernie serialize problem (#36769) (#36791)
b3602sss Oct 27, 2021
96edcea
show paddle traceback after last user code traceback (#36741) (#36765)
0x45f Oct 28, 2021
11b9f5f
[Cherry-pick]FFT function enhancements and bugfixes (#36537)
cxxly Oct 28, 2021
ae59223
Fix fused_attention_op and fused_feedforward_op bug when pre_layer_no…
limin2021 Oct 28, 2021
8ede9e6
[Cherry-pick] Enable CTC grad compute on GPU (#36780)
zh794390558 Oct 28, 2021
5fb2850
change api to support trt8 in pool3d_op_convert (#36783) (#36812)
Oct 28, 2021
9a96490
[fix-doc-bug] Fix fused_attention_op english doc test=document_fix (#…
limin2021 Oct 28, 2021
f20c5c9
[cherry-pick 2.2]support quantization of bert (#36820)
XGZhang11 Oct 28, 2021
7647d40
Update quant_conv2d_dequant_fuse_pass.cc (#36821)
XGZhang11 Oct 28, 2021
05b8630
Cherry-pick-36556: add paddle.version.cuda and paddle.version.cudnn A…
pangyoki Oct 28, 2021
0b7f43e
fix device docs;test=document_fix (#36784) (#36827)
Ligoml Oct 28, 2021
e3db65d
fix dygraph adamw (#36745) (#36794)
zhaoyinglia Oct 28, 2021
d8ffb26
【Cherry-pick PR 36511】fix out_of_range bug of multinomial op's cuda k…
pangyoki Oct 28, 2021
c716cf3
polish _remove_no_value_return_var() function (#36826) (#36830)
0x45f Oct 28, 2021
fa7aa6b
1. fix ifftshift(missing negative sign before shifts); (#36835)
Oct 29, 2021
f2daef5
tmp disable windows demo_ci ut (#36847)
jiweibo Oct 29, 2021
09bc9c0
Move the ASP training API to paddle.static.sparsity. (#36525) (#36860)
Xreki Oct 29, 2021
dcadc25
negative label in softmax cross entropy (#36907)
xingfeng01 Nov 1, 2021
ab2004b
[cherry-pick]fix cusparse compile bug in CUDA11.2, test=release/2.2 (…
Liu-xiandong Nov 1, 2021
76cab75
setitem support passing stop_gradient from value to tensor (#37028)
zyfncg Nov 8, 2021
a787b27
Optimized the solve op code:renamed var and removed template func (#3…
veyron95 Nov 8, 2021
70cb0a5
Fix rnn grad bug in cpu when dropout is zero (#37080) (#37086)
joey12300 Nov 10, 2021
287ca7d
MLPerf Optimization for Release/2.2 (#37109)
sneaxiy Nov 15, 2021
dc873eb
clean inference logs when config.DisableGlogInfo is triggered (#36356…
Shixiaowei02 Nov 16, 2021
79b9f47
fix bug of indexing with ellipsis (#37192)
zyfncg Nov 16, 2021
36dd295
[cherry-pick-2.2.1]fix fused_transformer_encoder_layer bug (#37229)
Nov 16, 2021
8cb370f
add fetch to black list (#37123) (#37261)
JZZ-NOTE Nov 17, 2021
71b04f6
add_fc_convert_layers_name (#37157) (#37267)
Wangzheee Nov 17, 2021
3fdbab2
cherrypick fix op_teller (#37266)
Wangzheee Nov 17, 2021
027664e
[Paddle-Inference] fix_qkv_plugin: fix half scale (#37096) (#37264)
Wangzheee Nov 17, 2021
5fd8312
[cherry-pick]Add sparse attention doc warning (#37189)
Liu-xiandong Nov 19, 2021
b559475
set net.forward to original forward function in flops (#36852) (#37357)
0x45f Nov 19, 2021
44db219
[Dy2stat]Support `for i in [1,2,3]` statements in dy2stat (#37259) (#…
0x45f Nov 19, 2021
604b6fc
fix bug to support dropout eval grad computing. (#37305) (#37331)
limin2021 Nov 22, 2021
109f8a8
[cherry-pick] Add paddle.incubate.graph_send_recv API(#37205) (#37343)
DesmonDay Nov 22, 2021
9ffb43b
Fix a bug of quantization (#36982) (#37381)
ceci3 Nov 22, 2021
6b3ffe9
cherry_pick conv2d omission (#37419)
JZZ-NOTE Nov 23, 2021
0fa96e9
fix memeory_optimize_pass bug (#37324) (#37413)
JZZ-NOTE Nov 23, 2021
2778fcd
fix shape api (#37412)
jiweibo Nov 23, 2021
eed736d
[Dy2stat]Allow users to switch eval/train mode when using @to_static …
0x45f Nov 23, 2021
d5e73f0
[Cherry-pick 2.2]Enhance error message of scatter op (#37431)
sneaxiy Nov 23, 2021
58a5113
cherry pick save/load in the_one_ps (#37461)
esythan Nov 23, 2021
436808c
elu support alpha < 0 (#37316) (#37437)
zhupengyang Nov 23, 2021
4dc426f
[cherry-pick]Refactor Heterogenous Pipeline Parameter Server (#37446)
zmxdream Nov 23, 2021
f873d3a
bug fix shard_index (#37042) (#37421)
Nov 23, 2021
bed652d
[Cherry pick 2.2] fix bugs to support bias add none for fused_attent…
limin2021 Nov 24, 2021
d31d597
Cherry-pick PR 37420, fix inplace bug when the first grad_var(loss_gr…
pangyoki Nov 25, 2021
89fb196
[cherry-pick-2.2.1]Opt topk (#37325)
Nov 25, 2021
824c4ef
avoid setting logging.basicConfig (#37031) (#37530)
kuizhiqing Nov 25, 2021
c8429d3
[cherry-pick 2.2]fix data parallel when VOCAB var in program (#37546)
Steffy-zxf Nov 25, 2021
ca8b858
Fix the issue of disordered loading cifar data (#37272) (#37528)
LielinJiang Nov 26, 2021
4b41b8e
[cherry-pick 2.2 heterps]bug fix for launch_utils.py (#37521) (#37570)
zmxdream Nov 26, 2021
3a81805
add new API/OP: paddle.linalg.triangular_solve (#36714) (#37551)
zhwesky2010 Nov 26, 2021
14fd53d
fix bug of slice_grad using use_mkldnn attr (#37584)
zyfncg Nov 26, 2021
4066713
fix heter_server (#37604) (#37627)
zmxdream Nov 28, 2021
46988e2
Fix bugs when bias add none in static graph for fused_attention op. (…
limin2021 Nov 29, 2021
7d9c669
fix pass_desc.proto compilation error, test=develop (#37614)
Avin0323 Nov 29, 2021
3a0c550
Fix dropout static when axis != None (#37223) (#37589)
smallv0221 Nov 29, 2021
a5cf2e3
fix range op (#37486) (#37611)
bjjwwang Nov 30, 2021
fe43bee
cherry-pick to 2.2 (#37238)
kuizhiqing Dec 1, 2021
56b1ccb
remove input dim check in op_teller and update ut (#37097) (#37773)
baoachun Dec 3, 2021
6ece0b1
fix trainer_pass.py (#37779) (#37796)
zmxdream Dec 3, 2021
615b33f
bugfix in fleetrun when launching multiple machines training manually…
Liwb5 Dec 6, 2021
81be365
Fix cflags D_GLIBCXX_USE_CXX11_ABI takes no effect problem in customi…
Aurelius84 Dec 7, 2021
72a6c14
Fix default behavior if block=None in static mode (#37827) (#37898)
Aurelius84 Dec 7, 2021
4114c4a
fix ernie (#37839) (#37960)
b3602sss Dec 8, 2021
026de65
[Dy2Stat]Polish for zip in dy2stat (#37846) (#37912)
0x45f Dec 9, 2021
a4c0c71
fix flatten in quant (#37741)
yghstill Dec 10, 2021
8b86aad
fix fetch op rename_input bug in QAT export model (#38025)
yghstill Dec 10, 2021
0e5846c
fix: when ceil_model==true && Padding_algo!=SAME, (x-size)/stride != …
shangzhizhou Dec 10, 2021
721e78c
Remove additional warnning in layer.to (#36700)
JiabinYang Oct 26, 2021
c4df875
Refine param conversion logic in layer.to (#36862)
zhangbo9674 Nov 9, 2021
7b7e8de
Fix Layer.to() of device bug (#37156)
zhangbo9674 Nov 18, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
2 changes: 1 addition & 1 deletion cmake/external/dlpack.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ set(DLPACK_PREFIX_DIR ${THIRD_PARTY_PATH}/dlpack)
set(DLPACK_SOURCE_DIR ${THIRD_PARTY_PATH}/dlpack/src/extern_dlpack)

set(DLPACK_REPOSITORY ${GIT_URL}/dmlc/dlpack.git)
set(DLPACK_TAG v0.2)
set(DLPACK_TAG v0.4)

cache_third_party(extern_dlpack
REPOSITORY ${DLPACK_REPOSITORY}
Expand Down
70 changes: 70 additions & 0 deletions cmake/external/lapack.cmake
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

INCLUDE (ExternalProject)

SET(LAPACK_PREFIX_DIR ${THIRD_PARTY_PATH}/lapack)
SET(LAPACK_SOURCE_DIR ${THIRD_PARTY_PATH}/lapack/src/extern_lapack)
SET(LAPACK_INSTALL_DIR ${THIRD_PARTY_PATH}/install/lapack)
SET(LAPACK_INCLUDE_DIR ${LAPACK_SOURCE_DIR})
SET(LAPACK_LIB_DIR ${LAPACK_INSTALL_DIR}/lib)

# Note(zhouwei): lapack need fortan compiler which many machines don't have, so use precompiled library.
# use lapack tag v3.10.0 on 06/28/2021 https://github.com/Reference-LAPACK/lapack
if(LINUX)
SET(LAPACK_VER "lapack_lnx_v3.10.0.20210628" CACHE STRING "" FORCE)
SET(LAPACK_URL "https://paddlepaddledeps.bj.bcebos.com/${LAPACK_VER}.tar.gz" CACHE STRING "" FORCE)
SET(LAPACK_URL_MD5 71f8cc8237a8571692f3e07f9a4f25f6)
SET(GNU_RT_LIB_1 "${LAPACK_LIB_DIR}/libquadmath.so.0")
SET(GFORTRAN_LIB "${LAPACK_LIB_DIR}/libgfortran.so.3")
SET(BLAS_LIB "${LAPACK_LIB_DIR}/libblas.so.3")
SET(LAPACK_LIB "${LAPACK_LIB_DIR}/liblapack.so.3")
elseif(WIN32)
# Refer to [lapack-for-windows] http://icl.cs.utk.edu/lapack-for-windows/lapack/#lapacke
SET(LAPACK_VER "lapack_win_v3.10.0.20210628" CACHE STRING "" FORCE)
SET(LAPACK_URL "https://paddlepaddledeps.bj.bcebos.com/${LAPACK_VER}.zip" CACHE STRING "" FORCE)
SET(LAPACK_URL_MD5 590d080392dcd5abbd5dca767a50b63a)
SET(GNU_RT_LIB_1 "${LAPACK_LIB_DIR}/libquadmath-0.dll")
SET(GNU_RT_LIB_2 "${LAPACK_LIB_DIR}/libgcc_s_seh-1.dll")
SET(GFORTRAN_LIB "${LAPACK_LIB_DIR}/libgfortran-3.dll")
SET(BLAS_LIB "${LAPACK_LIB_DIR}/libblas.dll")
SET(LAPACK_LIB "${LAPACK_LIB_DIR}/liblapack.dll")
else()
SET(LAPACK_VER "lapack_mac_v3.10.0.20210628" CACHE STRING "" FORCE)
SET(LAPACK_URL "https://paddlepaddledeps.bj.bcebos.com/${LAPACK_VER}.tar.gz" CACHE STRING "" FORCE)
SET(LAPACK_URL_MD5 427aecf8dee8523de3566ca8e47944d7)
SET(GNU_RT_LIB_1 "${LAPACK_LIB_DIR}/libquadmath.0.dylib")
SET(GNU_RT_LIB_2 "${LAPACK_LIB_DIR}/libgcc_s.1.dylib")
SET(GFORTRAN_LIB "${LAPACK_LIB_DIR}/libgfortran.5.dylib")
SET(BLAS_LIB "${LAPACK_LIB_DIR}/libblas.3.dylib")
SET(LAPACK_LIB "${LAPACK_LIB_DIR}/liblapack.3.dylib")
endif()

ExternalProject_Add(
extern_lapack
${EXTERNAL_PROJECT_LOG_ARGS}
URL ${LAPACK_URL}
URL_MD5 ${LAPACK_URL_MD5}
PREFIX ${LAPACK_PREFIX_DIR}
DOWNLOAD_DIR ${LAPACK_SOURCE_DIR}
DOWNLOAD_NO_PROGRESS 1
PATCH_COMMAND ""
UPDATE_COMMAND ""
CONFIGURE_COMMAND ""
BUILD_COMMAND ""
INSTALL_COMMAND ${CMAKE_COMMAND} -E copy_directory ${LAPACK_SOURCE_DIR} ${LAPACK_LIB_DIR}
BUILD_BYPRODUCTS ${BLAS_LIB}
BUILD_BYPRODUCTS ${LAPACK_LIB}
)

33 changes: 32 additions & 1 deletion cmake/external/lite.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -35,14 +35,22 @@ if (LITE_WITH_XPU)
ENDIF()
endif()

if (LITE_WITH_NNADAPTER)
add_definitions(-DLITE_SUBGRAPH_WITH_NNADAPTER)
if (NNADAPTER_WITH_HUAWEI_ASCEND_NPU)
add_definitions(-DLITE_SUBGRAPH_WITH_NPU)
set(NPU_SDK_ROOT "/usr/local/Ascend/ascend-toolkit/latest" CACHE STRING "default NPU SDK ROOT")
endif()
endif()

if (NOT LITE_SOURCE_DIR OR NOT LITE_BINARY_DIR)
include(ExternalProject)
set(LITE_PROJECT extern_lite)
set(LITE_SOURCES_DIR ${THIRD_PARTY_PATH}/lite)
set(LITE_INSTALL_DIR ${THIRD_PARTY_PATH}/install/lite)

if(NOT LITE_GIT_TAG)
set(LITE_GIT_TAG d3a3a6931b6d22d504d21ba32b3ae972770e9204)
set(LITE_GIT_TAG 62fc737d4a553bca738f96b0402b28f26a8d2d4f)
endif()

if(NOT CUDA_ARCH_NAME)
Expand All @@ -67,6 +75,9 @@ if (NOT LITE_SOURCE_DIR OR NOT LITE_BINARY_DIR)
-DLITE_WITH_XPU=${LITE_WITH_XPU}
-DXPU_SDK_URL=${XPU_BASE_URL}
-DXPU_SDK_ENV=${XPU_SDK_ENV}
-DLITE_WITH_NNADAPTER=${LITE_WITH_NNADAPTER}
-DNNADAPTER_WITH_HUAWEI_ASCEND_NPU=${NNADAPTER_WITH_HUAWEI_ASCEND_NPU}
-DNNADAPTER_HUAWEI_ASCEND_NPU_SDK_ROOT=${NPU_SDK_ROOT}
-DLITE_WITH_CODE_META_INFO=OFF
-DLITE_WITH_ARM=ON)
ExternalProject_Add(
Expand Down Expand Up @@ -110,6 +121,9 @@ if (NOT LITE_SOURCE_DIR OR NOT LITE_BINARY_DIR)
-DLITE_WITH_XPU=${LITE_WITH_XPU}
-DXPU_SDK_URL=${XPU_BASE_URL}
-DXPU_SDK_ENV=${XPU_SDK_ENV}
-DLITE_WITH_NNADAPTER=${LITE_WITH_NNADAPTER}
-DNNADAPTER_WITH_HUAWEI_ASCEND_NPU=${NNADAPTER_WITH_HUAWEI_ASCEND_NPU}
-DNNADAPTER_HUAWEI_ASCEND_NPU_SDK_ROOT=${NPU_SDK_ROOT}
-DLITE_WITH_CODE_META_INFO=OFF
-DLITE_WITH_ARM=OFF)

Expand All @@ -120,6 +134,7 @@ if (NOT LITE_SOURCE_DIR OR NOT LITE_BINARY_DIR)
GIT_TAG ${LITE_GIT_TAG}
PREFIX ${LITE_SOURCES_DIR}
UPDATE_COMMAND ""
PATCH_COMMAND sed -i "s?NNadapter_bridges_path = os.path.abspath('..')+\"\/lite\/kernels\/nnadapter\/bridges\/paddle_use_bridges.h\"?NNadapter_bridges_path = os.path.abspath(\'..\')+\"\/extern_lite\/lite\/kernels\/nnadapter\/bridges\/paddle_use_bridges.h\"?" ${LITE_SOURCES_DIR}/src/extern_lite//lite/tools/cmake_tools/record_supported_kernel_op.py
BUILD_COMMAND ${LITE_BUILD_COMMAND}
INSTALL_COMMAND ""
CMAKE_ARGS -DCMAKE_CXX_COMPILER=${CMAKE_CXX_COMPILER}
Expand All @@ -146,6 +161,11 @@ endif()
if (WITH_ARM)
if(LITE_WITH_XPU)
set(LITE_OUTPUT_BIN_DIR inference_lite_lib.armlinux.armv8.xpu)
elseif(LITE_WITH_NNADAPTER)
message("Enable LITE_WITH_NNADAPTER")
if (NNADAPTER_WITH_HUAWEI_ASCEND_NPU)
set(LITE_OUTPUT_BIN_DIR inference_lite_lib.armlinux.armv8.nnadapter)
endif()
else()
set(LITE_OUTPUT_BIN_DIR inference_lite_lib.armlinux.armv8)
endif()
Expand Down Expand Up @@ -174,5 +194,16 @@ endfunction()
external_lite_libs(lite_full_static ${LITE_BINARY_DIR}/${LITE_OUTPUT_BIN_DIR}/cxx/lib/libpaddle_full_api_shared.so)
set(LITE_SHARED_LIB ${LITE_BINARY_DIR}/${LITE_OUTPUT_BIN_DIR}/cxx/lib/libpaddle_full_api_shared.so)

if (LITE_WITH_NNADAPTER)
set(LITE_NNADAPTER_LIB ${LITE_BINARY_DIR}/${LITE_OUTPUT_BIN_DIR}/cxx/lib/libnnadapter.so)
if (NNADAPTER_WITH_HUAWEI_ASCEND_NPU)
external_lite_libs(lite_nnadapter ${LITE_BINARY_DIR}/${LITE_OUTPUT_BIN_DIR}/cxx/lib/libnnadapter.so ${LITE_BINARY_DIR}/${LITE_OUTPUT_BIN_DIR}/cxx/lib/libhuawei_ascend_npu.so)
set(LITE_DEPS lite_full_static lite_nnadapter)
set(LITE_NNADAPTER_NPU_LIB ${LITE_BINARY_DIR}/${LITE_OUTPUT_BIN_DIR}/cxx/lib/libhuawei_ascend_npu.so)
endif()
else()
set(LITE_DEPS lite_full_static)
endif()

add_definitions(-DPADDLE_WITH_LITE)
add_definitions(-DLITE_WITH_LOG)
51 changes: 51 additions & 0 deletions cmake/external/utf8proc.cmake
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

INCLUDE(ExternalProject)

SET(UTF8PROC_PREFIX_DIR ${THIRD_PARTY_PATH}/utf8proc)
SET(UTF8PROC_INSTALL_DIR ${THIRD_PARTY_PATH}/install/utf8proc)
# As we add extra features for utf8proc, we use the non-official repo
SET(UTF8PROC_REPOSITORY ${GIT_URL}/JuliaStrings/utf8proc.git)
SET(UTF8PROC_TAG v2.6.1)

IF(WIN32)
SET(UTF8PROC_LIBRARIES "${UTF8PROC_INSTALL_DIR}/lib/utf8proc_static.lib")
add_definitions(-DUTF8PROC_STATIC)
ELSE(WIN32)
SET(UTF8PROC_LIBRARIES "${UTF8PROC_INSTALL_DIR}/lib/libutf8proc.a")
ENDIF(WIN32)

INCLUDE_DIRECTORIES(${UTF8PROC_INSTALL_DIR}/include)

ExternalProject_Add(
extern_utf8proc
${EXTERNAL_PROJECT_LOG_ARGS}
${SHALLOW_CLONE}
GIT_REPOSITORY ${UTF8PROC_REPOSITORY}
GIT_TAG ${UTF8PROC_TAG}
PREFIX ${UTF8PROC_PREFIX_DIR}
UPDATE_COMMAND ""
CMAKE_ARGS -DCMAKE_C_FLAGS=${CMAKE_C_FLAGS}
-DBUILD_SHARED=ON
-DBUILD_STATIC=ON
-DCMAKE_CXX_FLAGS=${CMAKE_CXX_FLAGS}
-DCMAKE_INSTALL_PREFIX:PATH=${UTF8PROC_INSTALL_DIR}
-DCMAKE_BUILD_TYPE:STRING=${CMAKE_BUILD_TYPE}
BUILD_BYPRODUCTS ${UTF8PROC_LIBRARIES}
)

ADD_LIBRARY(utf8proc STATIC IMPORTED GLOBAL)
SET_PROPERTY(TARGET utf8proc PROPERTY IMPORTED_LOCATION ${UTF8PROC_LIBRARIES})
ADD_DEPENDENCIES(utf8proc extern_utf8proc)
2 changes: 1 addition & 1 deletion cmake/external/xpu.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ ELSE ()
ENDIF()

SET(XPU_BASE_URL_WITHOUT_DATE "https://baidu-kunlun-product.cdn.bcebos.com/KL-SDK/klsdk-dev")
SET(XPU_BASE_URL "${XPU_BASE_URL_WITHOUT_DATE}/20210909")
SET(XPU_BASE_URL "${XPU_BASE_URL_WITHOUT_DATE}/20210921")
SET(XPU_XRE_URL "${XPU_BASE_URL}/${XPU_XRE_DIR_NAME}.tar.gz" CACHE STRING "" FORCE)
SET(XPU_XDNN_URL "${XPU_BASE_URL}/${XPU_XDNN_DIR_NAME}.tar.gz" CACHE STRING "" FORCE)
SET(XPU_XCCL_URL "${XPU_BASE_URL_WITHOUT_DATE}/20210623/${XPU_XCCL_DIR_NAME}.tar.gz" CACHE STRING "" FORCE)
Expand Down
5 changes: 5 additions & 0 deletions cmake/inference_lib.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,11 @@ function(copy_part_of_thrid_party TARGET DST)
SRCS ${GLOG_INCLUDE_DIR} ${GLOG_LIBRARIES}
DSTS ${dst_dir} ${dst_dir}/lib)

set(dst_dir "${DST}/third_party/install/utf8proc")
copy(${TARGET}
SRCS ${UTF8PROC_INSTALL_DIR}/include ${UTF8PROC_LIBRARIES}
DSTS ${dst_dir} ${dst_dir}/lib)

if (WITH_CRYPTO)
set(dst_dir "${DST}/third_party/install/cryptopp")
copy(${TARGET}
Expand Down
6 changes: 4 additions & 2 deletions cmake/operators.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -185,6 +185,8 @@ function(op_library TARGET)
list(REMOVE_ITEM hip_srcs "cholesky_op.cu")
list(REMOVE_ITEM hip_srcs "matrix_rank_op.cu")
list(REMOVE_ITEM hip_srcs "svd_op.cu")
list(REMOVE_ITEM hip_srcs "eigvalsh_op.cu")
list(REMOVE_ITEM hip_srcs "qr_op.cu")
list(REMOVE_ITEM hip_srcs "eigh_op.cu")
list(REMOVE_ITEM hip_srcs "multinomial_op.cu")
list(REMOVE_ITEM hip_srcs "decode_jpeg_op.cu")
Expand Down Expand Up @@ -214,9 +216,9 @@ function(op_library TARGET)
foreach(manual_pybind_op "compare_all_op" "compare_op" "logical_op" "bitwise_op" "nccl_op"
"tensor_array_read_write_op" "tensorrt_engine_op" "conv_fusion_op"
"fusion_transpose_flatten_concat_op" "fusion_conv_inception_op"
"sync_batch_norm_op" "dgc_op" "fused_fc_elementwise_layernorm_op"
"sync_batch_norm_op" "sparse_attention_op" "dgc_op" "fused_fc_elementwise_layernorm_op"
"skip_layernorm_op" "multihead_matmul_op" "fusion_group_op" "fused_bn_activation_op" "fused_embedding_eltwise_layernorm_op" "fusion_gru_op" "fusion_lstm_op"
"fused_bn_add_activation_op")
"fused_bn_add_activation_op" "fused_attention_op" "fused_feedforward_op" "resnet_unit_op")
if ("${TARGET}" STREQUAL "${manual_pybind_op}")
set(pybind_flag 1)
endif()
Expand Down
11 changes: 8 additions & 3 deletions cmake/third_party.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -210,9 +210,14 @@ include(external/threadpool)# download threadpool
include(external/dlpack) # download dlpack
include(external/xxhash) # download, build, install xxhash
include(external/warpctc) # download, build, install warpctc
include(external/utf8proc) # download, build, install utf8proc

list(APPEND third_party_deps extern_eigen3 extern_gflags extern_glog extern_boost extern_xxhash)
list(APPEND third_party_deps extern_zlib extern_dlpack extern_warpctc extern_threadpool)
list(APPEND third_party_deps extern_zlib extern_dlpack extern_warpctc extern_threadpool extern_utf8proc)
include(external/lapack) # download, build, install lapack

list(APPEND third_party_deps extern_eigen3 extern_gflags extern_glog extern_boost extern_xxhash)
list(APPEND third_party_deps extern_zlib extern_dlpack extern_warpctc extern_threadpool extern_lapack)

include(cblas) # find first, then download, build, install openblas

Expand Down Expand Up @@ -250,8 +255,8 @@ if(WITH_GPU)
include(external/cub) # download cub
list(APPEND third_party_deps extern_cub)
endif()
set(URL "https://paddlepaddledeps.bj.bcebos.com/externalErrorMsg.tar.gz" CACHE STRING "" FORCE)
file_download_and_uncompress(${URL} "externalError" MD5 c0749523ebb536eb7382487d645d9cd4) # download file externalErrorMsg.tar.gz
set(URL "https://paddlepaddledeps.bj.bcebos.com/externalErrorMsg_20210928.tar.gz" CACHE STRING "" FORCE)
file_download_and_uncompress(${URL} "externalError" MD5 a712a49384e77ca216ad866712f7cafa) # download file externalErrorMsg.tar.gz
if(WITH_TESTING)
# copy externalErrorMsg.pb, just for unittest can get error message correctly.
set(SRC_DIR ${THIRD_PARTY_PATH}/externalError/data)
Expand Down
18 changes: 3 additions & 15 deletions paddle/fluid/distributed/service/brpc_utils.cc
Original file line number Diff line number Diff line change
Expand Up @@ -138,23 +138,11 @@ void SerializeSelectedRows(framework::Variable* var,
var_data->clear();
var_data->resize(rows->size() * sizeof(int64_t));
char* data_ptr = const_cast<char*>(var_data->data());

if (platform::is_cpu_place(tensor->place())) {
memcpy(data_ptr, &(*rows)[0], rows->size() * sizeof(int64_t));
} else {
#ifdef PADDLE_WITH_CUDA
auto stream =
reinterpret_cast<const platform::CUDADeviceContext&>(ctx).stream();
memory::Copy(platform::CPUPlace(), data_ptr,
BOOST_GET_CONST(platform::CUDAPlace, tensor->place()),
&(*rows)[0], rows->size() * sizeof(int64_t), stream);
#endif
}
memcpy(data_ptr, &((*rows)[0]), rows->size() * sizeof(int64_t));
var_msg->set_data_type(static_cast<VarMsg::Type>(tensor->type()));
for (auto& dim : framework::vectorize(tensor->dims())) {
var_msg->add_dims(dim);
}

// IO Buffer
if (platform::is_cpu_place(tensor->place())) {
auto data_len = tensor->numel() * framework::SizeOfType(tensor->type());
Expand Down Expand Up @@ -273,8 +261,8 @@ void DeserializeSelectedRows(framework::Variable* var, const VarMsg& msg,
auto* slr = var->GetMutable<framework::SelectedRows>();
framework::Tensor* tensor = slr->mutable_value();
slr->set_height(msg.slr_height());
std::vector<int64_t> tmp_rows(msg.slr_height());
memcpy(&tmp_rows[0], msg.data().data(), msg.slr_height() * sizeof(int64_t));
std::vector<int64_t> tmp_rows(msg.dims()[0]);
memcpy(tmp_rows.data(), msg.data().data(), msg.dims()[0] * sizeof(int64_t));
slr->set_rows(tmp_rows);
std::vector<int> vec_dim;
for (auto& x : msg.dims()) {
Expand Down
27 changes: 25 additions & 2 deletions paddle/fluid/distributed/service/communicator.cc
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,6 @@ See the License for the specific language governing permissions and
limitations under the License. */

#include "paddle/fluid/distributed/service/communicator.h"

#include <google/protobuf/text_format.h>

#include "gflags/gflags.h"
Expand Down Expand Up @@ -283,6 +282,18 @@ void Communicator::RpcSendSparse(const std::string &var_name, int table_id,
push_g_vec.push_back(tensor->mutable_value()->data<float>() + i * dim);
}

// TODO(wangguanqun): padding_idx is not ignored, this is a bug.
// if padding_idx == padding in datareader, the server will core.
/*
for (size_t i = 0; i < tensor->rows().size(); ++i) {
uint64_t real_id = static_cast<uint64_t>(tensor->rows()[i]);
if (real_id != 0) {
sparse_push_keys.push_back(real_id);
push_g_vec.push_back(tensor->mutable_value()->data<float>() + i * dim);
}
}
*/

++_async_call_num;
DownpourBrpcClosure *closure = new DownpourBrpcClosure(
request_call_num, [this, request_call_num](void *done) {
Expand Down Expand Up @@ -349,10 +360,23 @@ void Communicator::InitParams(const RecvCtxMap &recv_varname_to_ctx) {
<< " from 0' trainer done";
}
}
std::this_thread::sleep_for(
std::chrono::milliseconds(100 + trainer_id_ * 10));
BarrierWithTable(1);
return;
}

void Communicator::PullDense(const RecvCtxMap &recv_varname_to_ctx) {
for (auto &iter : recv_varname_to_ctx) {
auto &table_id = iter.first;
auto &varnames = iter.second;
RpcRecvDense(varnames, table_id, recv_scope_);
VLOG(1) << "pull dense param to table " << table_id
<< " from 0' trainer done";
}
return;
}

void Communicator::RpcProfilerControl() {
if (trainer_id_ == 0) {
if (!do_server_profiler_ && platform::IsProfileEnabled()) {
Expand Down Expand Up @@ -495,7 +519,6 @@ void AsyncCommunicator::SendByCommunicator() {
MergeVars<float>(var_name, vars[i], send_scope_.get(), 1);
}
}

if (ctx.is_tensor_table) {
SendGlobalStep(ctx, merged_var_num, send_scope_.get());
} else if (ctx.is_sparse) {
Expand Down
2 changes: 2 additions & 0 deletions paddle/fluid/distributed/service/communicator.h
Original file line number Diff line number Diff line change
Expand Up @@ -271,6 +271,8 @@ class Communicator {

virtual void InitParams(const RecvCtxMap &recv_varname_to_ctx);

virtual void PullDense(const RecvCtxMap &recv_varname_to_ctx);

virtual void Start() = 0;

virtual void Stop() = 0;
Expand Down
Loading