Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update #3

Merged
merged 146 commits into from
Apr 19, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
146 commits
Select commit Hold shift + click to select a range
17030ff
fix op benchmark ci error caused by missing test_pr branch, test=docu…
Avin0323 Mar 30, 2021
c4b60ef
Fix segment Fault from set_value (#31891)
Aurelius84 Mar 30, 2021
64ee255
[Paddle-TRT] yolobox (#31755)
zlsh80826 Mar 30, 2021
8084b75
fix batchnorm when inpu dims < 3 (#31933)
shangzhizhou Mar 30, 2021
73a6fa3
add deprecated for softmax_with_cross_entropy (#31722)
chajchaj Mar 30, 2021
fe28486
add exclusive for test_conv2d_op, test=develop (#31936)
jerrywgz Mar 30, 2021
04a49b0
[Custom OP]Remove old custom OP and reduce whl package volume (#31813)
zhwesky2010 Mar 30, 2021
e50bc2c
Enhance cmake to support specifying CUDA_ARCH_NAME to Ampere. (#31923)
Xreki Mar 30, 2021
e1f9316
Fix save/load error in imperative qat UT. (#31937)
wzzju Mar 30, 2021
245252b
fix bug when dtype of to_tensor is core.VarType (#31931)
zhwesky2010 Mar 30, 2021
14b7e3c
[Paddle-TRT] TRT inference support for BERT/Transformer in paddle 2.0…
cryoco Mar 30, 2021
6dca7a1
Added int8 kernel for oneDNN LSTM op (#31894)
jakpiase Mar 30, 2021
a37a7f6
modify CI recommend information (#31395)
Avin0323 Mar 30, 2021
98e803e
map_matmul_to_mul_pass support 3dim (#31958)
cryoco Mar 30, 2021
0fa6c8a
fix a syntax error, test=develop (#31930)
Shixiaowei02 Mar 30, 2021
57d4288
[dynamic setitem] Fix bug of dynamic setitem: Decerease axes to do ri…
liym27 Mar 30, 2021
95f808c
fix stack op grad nullptr (#31962)
bjjwwang Mar 30, 2021
ef8323d
[ROCM] Add ROCm support for warpctc op (#31817)
windstamp Mar 31, 2021
5394194
support minus-int idx to LayerList (#31750)
lyuwenyu Mar 31, 2021
52b05ba
fix some bug in transformer training in xpu (#31918)
taixiurong Mar 31, 2021
3a95a0b
update cmake minimum version to 3.15 (#31807)
Avin0323 Mar 31, 2021
393b3bd
fix split core (#31892)
Thunderbrook Mar 31, 2021
b09c1ce
fix whl package push pypi (#31585)
tianshuo78520a Mar 31, 2021
587d99a
update compilation with C++14 (#31815)
Avin0323 Mar 31, 2021
495e7f9
Update eigen version to f612df27 (#31832)
Avin0323 Mar 31, 2021
e973bd7
Polish tensor pipeline (#31701)
heavengate Mar 31, 2021
ea738dd
delete cuda9 code (#31883)
tianshuo78520a Mar 31, 2021
6f85e24
fix one error massage (#31904)
Kqnonrime Mar 31, 2021
695dd37
Adjust pipeline optimizer for 3d parallelism (#31939)
Mar 31, 2021
b05f614
[Parallel UT]Improve Parallel UT level on Windows/Linux (#31377)
zhwesky2010 Mar 31, 2021
d5b5004
Delete legacy C++ training user-interface (#31949)
tianshuo78520a Mar 31, 2021
eb3199f
fix compilation error on rocm, test=develop (#31991)
Avin0323 Apr 1, 2021
6b74486
fix en doc for emb (#31980)
seiriosPlus Apr 1, 2021
dbeb3ea
Refactor and simplify hook design & add Tensor.register_hook API (#31…
chenwhql Apr 1, 2021
0774159
new group (#31682)
kuizhiqing Apr 1, 2021
980227f
Support uint8_t for fill_constant_op (#31911)
ZzSean Apr 1, 2021
4acc87b
Optimize the perf of SameDimsAdd CUDA Kernel (#31872)
ZzSean Apr 1, 2021
b807e40
[Paddle-TRT] add anchor generator op plugin (#31730)
zlsh80826 Apr 1, 2021
0589ed2
LOG CLEAN (#31819)
seiriosPlus Apr 1, 2021
9c5d028
remove useless code (#32001)
hutuxian Apr 1, 2021
83b953f
add custom init grad for backward function (#31540)
MingMingShangTian Apr 1, 2021
40e6c57
fix doc of Pooling layers (#31977)
weisy11 Apr 1, 2021
8460698
Support control flow in DataParallel (#31625)
ForFishes Apr 1, 2021
1b6c1d3
fix doc preblem (#32010)
kuizhiqing Apr 1, 2021
68e7de2
fix use_softmax=False does not work, test=develop
chajchaj Apr 1, 2021
a4b30a1
[ROCM] fix depthwise conv failure on ROCM, test=develop (#31998)
qili93 Apr 1, 2021
df5aff8
fix typo in spawn (#32017)
chenwhql Apr 1, 2021
0e52cdf
delete test_data_generator (#31987)
yaoxuefeng6 Apr 1, 2021
0b42f48
fix random compile failed on windows (#32032)
zhwesky2010 Apr 2, 2021
4490e8a
add leaky_relu forward and backward in activation_op.cu (#31841)
AnnaTrainingG Apr 2, 2021
9e06a64
[ROCM] fix softmax_with_cross_entropy_op (#31982)
ronny1996 Apr 2, 2021
94736d6
graph engine (#31226)
seemingwang Apr 2, 2021
d918786
update trt engine addplugin name. (#32018)
jiweibo Apr 2, 2021
ed49b41
update plugin creator name (#32021)
jiweibo Apr 2, 2021
cd74b20
Add more ops to calculate output scales (#32036)
juncaipeng Apr 2, 2021
bf10d56
fix decorator in py2 (#32043)
tianshuo78520a Apr 2, 2021
43367e4
support save/load single tensor (#31756)
hbwx24 Apr 2, 2021
69c874f
[3D-Parallel:Sharding] Optimizations for supporting ERNIE 3.0 trainin…
JZ-LIANG Apr 2, 2021
290be88
use busybox run test on windows openblas (#31728)
XieYunshen Apr 2, 2021
36687d7
delete temporary files (#32055)
hbwx24 Apr 3, 2021
1e52f32
Optimize elementwise_add_grad op, test=develop (#32051)
thisjiang Apr 3, 2021
a3b08ba
[ROCM] fix the backward maxpool (#32030)
ronny1996 Apr 6, 2021
2e82b6c
[Hybrid Parallel] Add Topology for hybrid communicate (#32011)
ForFishes Apr 6, 2021
9e8f903
fix two error message (#32039)
Kqnonrime Apr 6, 2021
6d6ea56
remove pass restrictions for skip-ln pass (#32081)
cryoco Apr 6, 2021
b17e36a
[PaddleTRT] Yolov3 bugfix (#32064)
zlsh80826 Apr 6, 2021
78af100
fix test of affine_grid with rocm (#32047)
Ray2020BD Apr 6, 2021
187bf41
optimize compilation of operators using eigen (#31851)
Avin0323 Apr 6, 2021
a17c369
fix fc doc (#32084)
Joejiong Apr 6, 2021
b8b82b7
Del cudnn6 code2 (#31986)
tianshuo78520a Apr 6, 2021
a881b4d
Struct SparseValue && Bug Fix (#31721)
seiriosPlus Apr 7, 2021
e625f88
print build summary (#32110)
iducn Apr 7, 2021
f5186c3
update name of develop whl package and upgrade gcc 4.8.2 to gcc 5.4 (…
pangyoki Apr 7, 2021
10af966
update the TraceLayer.save_inference_model method with add file suffi…
CtfGo Apr 7, 2021
363b25a
improve performance of DepthwiseConv(NHWC) (#31677)
OuyangChao Apr 7, 2021
1e60a0c
[3D-parallelism] Hybrid Model Parallelism (#32074)
JZ-LIANG Apr 7, 2021
8c7c53b
【NPU】Merge ascend GE&distributed code by 0208 from ascendrc (#31957)
frankwhzhang Apr 7, 2021
d91faf2
bugfix for unit test test_segment_ops (#32116)
windstamp Apr 7, 2021
e09f4db
Check added ut on windows (#31826)
XieYunshen Apr 7, 2021
4935b8e
move graph files (#32103)
seemingwang Apr 7, 2021
297290a
add uint8 type for flatten op (#32120)
danleifeng Apr 7, 2021
f74f976
fix the XXX_GRAD_CASE bug by HexToString (#32004)
HexToString Apr 8, 2021
7230203
fix bug (#32135)
ForFishes Apr 8, 2021
6e65fe0
The unsupported_fp16_list using in AMP will be created automatically …
wzzju Apr 8, 2021
5434496
4D Hybrid Parallelism (#32134)
JZ-LIANG Apr 8, 2021
e45c3fa
Add LayerDict class (#31951)
MingMingShangTian Apr 8, 2021
1bae1e7
Support converting the model from fp32 to fp16 (#32112)
juncaipeng Apr 8, 2021
3822247
[ROCM] update rocm skip ut list, test=develop (#32149)
cl87 Apr 9, 2021
dabaca0
Candidate fix to #31992 (#32136)
jczaja Apr 9, 2021
55730d9
[Dy2Stat] Support DictCmp and zip grammer (#32159)
Aurelius84 Apr 9, 2021
d815fbf
[CustomOp]Support MacOS platform and Remove libpaddle_custom_op.so de…
Aurelius84 Apr 9, 2021
95122eb
Advoid CPU -> CPU memory copy when start, end, step is already on CPU…
Xreki Apr 9, 2021
4636d13
[Dy2Stat] Fix undefined var used in For (#32153)
Aurelius84 Apr 9, 2021
a73cb67
fix unittest timeour (#32161)
shangzhizhou Apr 9, 2021
ccf5709
[NPU] cherry-pick basic NPU components/allocator/operator/executor su…
zhiqiu Apr 9, 2021
ec2ffb6
make high precision for avg_pool and adaptive_avg_pool when data_type…
AnnaTrainingG Apr 9, 2021
afa3720
Ci py3 gcc5.4 (#32045)
tianshuo78520a Apr 10, 2021
f8bab5b
Optimize the performance of the forward of log_softmax when axis is -…
AshburnLee Apr 10, 2021
a2387ef
fix concat_grad on kunlun (#32151)
tangzhiyi11 Apr 12, 2021
80698ca
remove PYTHON_ABI, test=document_fix (#32190)
Avin0323 Apr 12, 2021
af374ae
follow comments to refine PR 32144 (#32174)
zhiqiu Apr 12, 2021
d8afe40
Optimization of bilinear backward OP CUDA kernel. (#30950)
JamesLim-sy Apr 12, 2021
bd2a4e2
[ROCM] fix some unittests (#32129)
ronny1996 Apr 12, 2021
bb3b790
[CustomOp]Fix description of supporting MacOS (#32192)
Aurelius84 Apr 12, 2021
8dacfb5
Optimize the process of obtaining prec_list on windows (#32123)
XieYunshen Apr 12, 2021
4b5cb22
[Rocm] fix python test of multinomial (#32158)
Ray2020BD Apr 12, 2021
0624ea5
polish custom api content for performence (#32209)
chenwhql Apr 12, 2021
4a09c1a
run the sample codes added by `add_sample_code` in ops.py (#31863)
wadefelix Apr 13, 2021
fdf63b4
optimize check_finite_and_unscale_op by fused kernel, test=develop (#…
thisjiang Apr 13, 2021
693c762
[ROCM] fix depth conv2d in rocm, test=develop (#32170)
qili93 Apr 13, 2021
6e946e9
add layer.to api (#32040)
MingMingShangTian Apr 13, 2021
7ab47e8
Fix prec on windows for long args (#32218)
XieYunshen Apr 13, 2021
1d5d3e4
add statistics_UT_resource.sh for imporving UT parallel level (#32220)
zhwesky2010 Apr 13, 2021
b9e543f
upgrade to oneDNN2.2.1 (fix when prim descriptor or attr contain NaN)…
lidanqing-intel Apr 13, 2021
cb81826
extend multiclass_nms unittest timeout threshold (#32214)
cryoco Apr 13, 2021
4281eb4
add new post-quant methods (#32208)
XGZhang11 Apr 14, 2021
f4b2ce4
fix expand op lack of float16 (#32238)
HexToString Apr 14, 2021
95939b5
add common dtypes as paddle's dtypes (#32012)
Apr 14, 2021
279b653
Add model benchmark ci (#32247)
xiegegege Apr 14, 2021
995b5f2
fix matrix_inverse_op with rocm (#32128)
Ray2020BD Apr 14, 2021
22ea4c3
Delete grpc.cmake/distribeted/distributed_ops (#32166)
tianshuo78520a Apr 14, 2021
f3e49c4
Fix rocm cmake (#32230)
qili93 Apr 14, 2021
7ba85ac
Add inner register backward hook method for Tensor (#32171)
chenwhql Apr 14, 2021
8552a18
[Paddle-TRT] Add check for TRT runtime dynamic shape (#32155)
cryoco Apr 14, 2021
63abd50
softmax reconstruction and optimization (#31821)
xingfeng01 Apr 14, 2021
7b9fcac
add marco cond for multi function (#32239)
chenwhql Apr 14, 2021
3ac6c18
adds new CPU kernel for SGD op supporting BF16 data type (#32162)
arogowie-intel Apr 14, 2021
3a804a0
Added oneDNN reduce_op FWD kernel (#31816)
jakpiase Apr 14, 2021
7da4455
support the bool tensor and scalar (#32272)
wawltor Apr 14, 2021
5dc0a6e
Optimize of backward of log_softmax when axis is -1 and dim_size <= 1…
AshburnLee Apr 14, 2021
69d8027
Optimize the bec_loss op to avoid copy input back to CPU. (#32265)
Xreki Apr 14, 2021
e6bc358
【NPU】Cherry-pick ascendrc ops code by 0325 to develop (#32197)
frankwhzhang Apr 15, 2021
0c037d2
fix test sync_with_cpp (#32212)
fangshuixun007 Apr 15, 2021
29f6522
Customizable Python Layer in Dygraph (#32130)
hbwx24 Apr 15, 2021
f946ba6
Fix some error message (#32169)
Kqnonrime Apr 15, 2021
cfdde0e
【Deepmd Support】add IsInitialized and tanh double grad (#32188)
JiabinYang Apr 15, 2021
668a0d3
support int for nearest_interp, test=develop (#32270)
tink2123 Apr 15, 2021
9f8c8f9
heterps support pscore (#32093)
Thunderbrook Apr 15, 2021
90133d2
[ROCM] bugfix for unit tests (#32258)
windstamp Apr 15, 2021
825d495
Correct typos (#32288)
AshburnLee Apr 15, 2021
a8c3a90
tree-based-model (#31696)
123malin Apr 15, 2021
fabdb43
Update hapi to support AMP (#31417)
LiuChiachi Apr 15, 2021
6da043e
support ernie trt-int8 for inference (#32232)
ceci3 Apr 16, 2021
03c9ecd
test=develop, fix index_wrapper's cmake depends(#32314)
123malin Apr 16, 2021
66d4622
[Hybrid Parallel] Add model parallel support in dygraph (#32248)
ForFishes Apr 16, 2021
2c18258
Unify the implementation of elementwise operation of same dimensions …
ZzSean Apr 18, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
2 changes: 1 addition & 1 deletion .github/ISSUE_TEMPLATE/---document-issue-.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,4 +56,4 @@ For example: no sample code; The sample code is not helpful; The sample code not
For example:Chinese API in this doc is inconsistent with English API, including params, description, sample code, formula, etc.

#### Other
For example: The doc link is broken; The doc page is missing; Dead link in docs.
For example: The doc link is broken; The doc page is missing; Dead link in docs.
29 changes: 18 additions & 11 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,8 @@
# See the License for the specific language governing permissions and
# limitations under the License

cmake_minimum_required(VERSION 3.10)
cmake_minimum_required(VERSION 3.15)
cmake_policy(VERSION 3.10)
set(CMAKE_MODULE_PATH ${CMAKE_MODULE_PATH} "${CMAKE_CURRENT_SOURCE_DIR}/cmake")
set(PADDLE_SOURCE_DIR ${CMAKE_CURRENT_SOURCE_DIR})
set(PADDLE_BINARY_DIR ${CMAKE_CURRENT_BINARY_DIR})
Expand All @@ -32,16 +33,19 @@ option(WITH_TENSORRT "Compile PaddlePaddle with NVIDIA TensorRT" OFF)
option(WITH_XPU "Compile PaddlePaddle with BAIDU KUNLUN XPU" OFF)
option(WITH_WIN_DUMP_DBG "Compile with windows core dump debug mode" OFF)
option(WITH_ASCEND "Compile PaddlePaddle with ASCEND" OFF)
option(WITH_ROCM "Compile PaddlePaddle with ROCM platform" OFF)
# NOTE(zhiqiu): WITH_ASCEND_CL can be compile on x86_64, so we can set WITH_ASCEND=OFF and WITH_ASCEND_CL=ON
# to develop some acl related functionality on x86
option(WITH_ASCEND_CL "Compile PaddlePaddle with ASCEND CL" ${WITH_ASCEND})
option(WITH_ASCEND_CXX11 "Compile PaddlePaddle with ASCEND and CXX11 ABI" OFF)
if (WITH_GPU AND WITH_XPU)
message(FATAL_ERROR "Error when compile GPU and XPU at the same time")
endif()
if (WITH_GPU AND WITH_ASCEND)
if (WITH_GPU AND WITH_ASCEND)
message(FATAL_ERROR "Error when compile GPU and ASCEND at the same time")
endif()
# cmake 3.12, 3.13, 3.14 will append gcc link options to nvcc, and nvcc doesn't recognize them.
if(WITH_GPU AND (${CMAKE_VERSION} VERSION_GREATER_EQUAL 3.12) AND (${CMAKE_VERSION} VERSION_LESS 3.15))
message(FATAL_ERROR "cmake ${CMAKE_VERSION} is not supported when WITH_GPU=ON because of bug https://cmake.org/pipermail/cmake/2018-September/068195.html. "
"You can use cmake 3.16 (recommended), 3.10, 3.11, 3.15 or 3.17. Please refer to the install document: https://cmake.org/install/")
if (WITH_GPU AND WITH_ROCM)
message(FATAL_ERROR "Error when compile CUDA and ROCM at the same time")
endif()

if(WITH_GPU AND NOT APPLE)
Expand All @@ -61,6 +65,9 @@ if(WITH_MUSL)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-error=deprecated-declarations -Wno-deprecated-declarations -Wno-error=pessimizing-move -Wno-error=deprecated-copy")
endif()

if(WITH_ASCEND AND NOT WITH_ASCEND_CXX11)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -D_GLIBCXX_USE_CXX11_ABI=0")
endif()

if(WIN32)
option(MSVC_STATIC_CRT "use static C Runtime library by default" ON)
Expand Down Expand Up @@ -165,8 +172,6 @@ option(WITH_DISTRIBUTE "Compile with distributed support" OFF)
option(WITH_BRPC_RDMA "Use brpc rdma as the rpc protocal" OFF)
option(ON_INFER "Turn on inference optimization and inference-lib generation" OFF)
################################ Internal Configurations #######################################
option(WITH_ROCM "Compile PaddlePaddle with ROCM platform" OFF)
option(WITH_RCCL "Compile PaddlePaddle with RCCL support" OFF)
option(WITH_NV_JETSON "Compile PaddlePaddle with NV JETSON" OFF)
option(WITH_PROFILER "Compile PaddlePaddle with GPU profiler and gperftools" OFF)
option(WITH_COVERAGE "Compile PaddlePaddle with code coverage" OFF)
Expand All @@ -179,12 +184,14 @@ option(WITH_XBYAK "Compile with xbyak support" ON)
option(WITH_CONTRIB "Compile the third-party contributation" OFF)
option(WITH_GRPC "Use grpc as the default rpc framework" ${WITH_DISTRIBUTE})
option(WITH_PSCORE "Compile with parameter server support" ${WITH_DISTRIBUTE})
option(WITH_HETERPS "Compile with heterps" OFF})
option(WITH_INFERENCE_API_TEST "Test fluid inference C++ high-level api interface" OFF)
option(PY_VERSION "Compile PaddlePaddle with python3 support" ${PY_VERSION})
option(WITH_DGC "Use DGC(Deep Gradient Compression) or not" ${WITH_DISTRIBUTE})
option(SANITIZER_TYPE "Choose the type of sanitizer, options are: Address, Leak, Memory, Thread, Undefined" OFF)
option(WITH_LITE "Compile Paddle Fluid with Lite Engine" OFF)
option(WITH_NCCL "Compile PaddlePaddle with NCCL support" ON)
option(WITH_RCCL "Compile PaddlePaddle with RCCL support" ON)
option(WITH_XPU_BKCL "Compile PaddlePaddle with BAIDU KUNLUN XPU BKCL" OFF)
option(WITH_CRYPTO "Compile PaddlePaddle with crypto support" ON)
option(WITH_ARM "Compile PaddlePaddle with arm support" OFF)
Expand Down Expand Up @@ -302,9 +309,9 @@ endif(WITH_ROCM)

if (NOT WITH_ROCM AND WITH_RCCL)
MESSAGE(WARNING
"Disable RCCL when compiling without GPU. Force WITH_RCCL=OFF.")
set(WITH_NCCL OFF CACHE STRING
"Disable RCCL when compiling without GPU" FORCE)
"Disable RCCL when compiling without ROCM. Force WITH_RCCL=OFF.")
set(WITH_RCCL OFF CACHE STRING
"Disable RCCL when compiling without ROCM" FORCE)
endif()

if(WITH_RCCL)
Expand Down
16 changes: 14 additions & 2 deletions cmake/configure.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,10 @@ if(WITH_ASCEND)
add_definitions(-DPADDLE_WITH_ASCEND)
endif()

if(WITH_ASCEND_CL)
add_definitions(-DPADDLE_WITH_ASCEND_CL)
endif()

if(WITH_XPU)
message(STATUS "Compile with XPU!")
add_definitions(-DPADDLE_WITH_XPU)
Expand All @@ -93,13 +97,18 @@ if(WITH_GPU)

FIND_PACKAGE(CUDA REQUIRED)

if(${CMAKE_CUDA_COMPILER_VERSION} VERSION_LESS 7)
message(FATAL_ERROR "Paddle needs CUDA >= 7.0 to compile")
if(${CMAKE_CUDA_COMPILER_VERSION} VERSION_LESS 10.1)
message(FATAL_ERROR "Paddle needs CUDA >= 10.1 to compile")
endif()

if(NOT CUDNN_FOUND)
message(FATAL_ERROR "Paddle needs cudnn to compile")
endif()

if(${CUDNN_MAJOR_VERSION} VERSION_LESS 7)
message(FATAL_ERROR "Paddle needs CUDNN >= 7.0 to compile")
endif()

if(CUPTI_FOUND)
include_directories(${CUPTI_INCLUDE_DIR})
add_definitions(-DPADDLE_WITH_CUPTI)
Expand Down Expand Up @@ -164,6 +173,9 @@ if(WITH_PSCORE)
add_definitions(-DPADDLE_WITH_PSCORE)
endif()

if(WITH_HETERPS)
add_definitions(-DPADDLE_WITH_HETERPS)
endif()

if(WITH_GRPC)
add_definitions(-DPADDLE_WITH_GRPC)
Expand Down
39 changes: 7 additions & 32 deletions cmake/cuda.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,9 @@ endif()
if (WITH_NV_JETSON)
add_definitions(-DWITH_NV_JETSON)
set(paddle_known_gpu_archs "53 62 72")
set(paddle_known_gpu_archs7 "53")
set(paddle_known_gpu_archs8 "53 62")
set(paddle_known_gpu_archs9 "53 62")
set(paddle_known_gpu_archs10 "53 62 72")
else()
set(paddle_known_gpu_archs "30 35 50 52 60 61 70")
set(paddle_known_gpu_archs7 "30 35 50 52")
set(paddle_known_gpu_archs8 "30 35 50 52 60 61")
set(paddle_known_gpu_archs9 "30 35 50 52 60 61 70")
set(paddle_known_gpu_archs "35 50 52 60 61 70 75 80")
set(paddle_known_gpu_archs10 "35 50 52 60 61 70 75")
set(paddle_known_gpu_archs11 "52 60 61 70 75 80")
endif()
Expand Down Expand Up @@ -74,7 +68,7 @@ endfunction()
# select_nvcc_arch_flags(out_variable)
function(select_nvcc_arch_flags out_variable)
# List of arch names
set(archs_names "Kepler" "Maxwell" "Pascal" "Volta" "Turing" "All" "Manual")
set(archs_names "Kepler" "Maxwell" "Pascal" "Volta" "Turing" "Ampere" "All" "Manual")
set(archs_name_default "Auto")
list(APPEND archs_names "Auto")

Expand Down Expand Up @@ -108,6 +102,8 @@ function(select_nvcc_arch_flags out_variable)
set(cuda_arch_bin "70")
elseif(${CUDA_ARCH_NAME} STREQUAL "Turing")
set(cuda_arch_bin "75")
elseif(${CUDA_ARCH_NAME} STREQUAL "Ampere")
set(cuda_arch_bin "80")
elseif(${CUDA_ARCH_NAME} STREQUAL "All")
set(cuda_arch_bin ${paddle_known_gpu_archs})
elseif(${CUDA_ARCH_NAME} STREQUAL "Auto")
Expand Down Expand Up @@ -158,25 +154,7 @@ function(select_nvcc_arch_flags out_variable)
endfunction()

message(STATUS "CUDA detected: " ${CMAKE_CUDA_COMPILER_VERSION})
if (${CMAKE_CUDA_COMPILER_VERSION} LESS 7.0)
set(paddle_known_gpu_archs ${paddle_known_gpu_archs})
elseif (${CMAKE_CUDA_COMPILER_VERSION} LESS 8.0) # CUDA 7.x
set(paddle_known_gpu_archs ${paddle_known_gpu_archs7})
set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -D_MWAITXINTRIN_H_INCLUDED")
set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -D__STRICT_ANSI__")
elseif (${CMAKE_CUDA_COMPILER_VERSION} LESS 9.0) # CUDA 8.x
set(paddle_known_gpu_archs ${paddle_known_gpu_archs8})
set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -D_MWAITXINTRIN_H_INCLUDED")
set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -D__STRICT_ANSI__")
# CUDA 8 may complain that sm_20 is no longer supported. Suppress the
# warning for now.
set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -Wno-deprecated-gpu-targets")
elseif (${CMAKE_CUDA_COMPILER_VERSION} LESS 10.0) # CUDA 9.x
set(paddle_known_gpu_archs ${paddle_known_gpu_archs9})
set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -D_MWAITXINTRIN_H_INCLUDED")
set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -D__STRICT_ANSI__")
set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -Wno-deprecated-gpu-targets")
elseif (${CMAKE_CUDA_COMPILER_VERSION} LESS 11.0) # CUDA 10.x
if (${CMAKE_CUDA_COMPILER_VERSION} LESS 11.0) # CUDA 10.x
set(paddle_known_gpu_archs ${paddle_known_gpu_archs10})
set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -D_MWAITXINTRIN_H_INCLUDED")
set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -D__STRICT_ANSI__")
Expand Down Expand Up @@ -206,14 +184,11 @@ select_nvcc_arch_flags(NVCC_FLAGS_EXTRA)
set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} ${NVCC_FLAGS_EXTRA}")
message(STATUS "NVCC_FLAGS_EXTRA: ${NVCC_FLAGS_EXTRA}")

# Set C++11 support
# Set C++14 support
set(CUDA_PROPAGATE_HOST_FLAGS OFF)
# Release/Debug flags set by cmake. Such as -O3 -g -DNDEBUG etc.
# So, don't set these flags here.
if (NOT WIN32) # windows msvc2015 support c++11 natively.
# -std=c++11 -fPIC not recoginize by msvc, -Xcompiler will be added by cmake.
set(CMAKE_CUDA_STANDARD 11)
endif(NOT WIN32)
set(CMAKE_CUDA_STANDARD 14)

# (Note) For windows, if delete /W[1-4], /W1 will be added defaultly and conflic with -w
# So replace /W[1-4] with /W0
Expand Down
2 changes: 1 addition & 1 deletion cmake/cudnn.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ macro(find_cudnn_version cudnn_header_file)
"${CUDNN_MAJOR_VERSION} * 1000 +
${CUDNN_MINOR_VERSION} * 100 + ${CUDNN_PATCHLEVEL_VERSION}")
message(STATUS "Current cuDNN header is ${cudnn_header_file} "
"Current cuDNN version is v${CUDNN_MAJOR_VERSION}.${CUDNN_MINOR_VERSION}. ")
"Current cuDNN version is v${CUDNN_MAJOR_VERSION}.${CUDNN_MINOR_VERSION}.${CUDNN_PATCHLEVEL_VERSION}. ")
endif()
endif()
endmacro()
Expand Down
111 changes: 65 additions & 46 deletions cmake/external/ascend.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -12,50 +12,69 @@
# See the License for the specific language governing permissions and
# limitations under the License.

INCLUDE(ExternalProject)

SET(ASCEND_PROJECT "extern_ascend")
IF((NOT DEFINED ASCEND_VER) OR (NOT DEFINED ASCEND_URL))
MESSAGE(STATUS "use pre defined download url")
SET(ASCEND_VER "0.1.1" CACHE STRING "" FORCE)
SET(ASCEND_NAME "ascend" CACHE STRING "" FORCE)
SET(ASCEND_URL "http://paddle-ascend.bj.bcebos.com/ascend.tar.gz" CACHE STRING "" FORCE)
ENDIF()
MESSAGE(STATUS "ASCEND_NAME: ${ASCEND_NAME}, ASCEND_URL: ${ASCEND_URL}")
SET(ASCEND_SOURCE_DIR "${THIRD_PARTY_PATH}/ascend")
SET(ASCEND_DOWNLOAD_DIR "${ASCEND_SOURCE_DIR}/src/${ASCEND_PROJECT}")
SET(ASCEND_DST_DIR "ascend")
SET(ASCEND_INSTALL_ROOT "${THIRD_PARTY_PATH}/install")
SET(ASCEND_INSTALL_DIR ${ASCEND_INSTALL_ROOT}/${ASCEND_DST_DIR})
SET(ASCEND_ROOT ${ASCEND_INSTALL_DIR})
SET(ASCEND_INC_DIR ${ASCEND_ROOT}/include)
SET(ASCEND_LIB_DIR ${ASCEND_ROOT}/lib)
SET(ASCEND_LIB ${ASCEND_LIB_DIR}/libge_runner.so)
SET(ASCEND_GRAPH_LIB ${ASCEND_LIB_DIR}/libgraph.so)
SET(CMAKE_INSTALL_RPATH "${CMAKE_INSTALL_RPATH}" "${ASCEND_ROOT}/lib")

INCLUDE_DIRECTORIES(${ASCEND_INC_DIR})
FILE(WRITE ${ASCEND_DOWNLOAD_DIR}/CMakeLists.txt
"PROJECT(ASCEND)\n"
"cmake_minimum_required(VERSION 3.0)\n"
"install(DIRECTORY ${ASCEND_NAME}/include ${ASCEND_NAME}/lib \n"
" DESTINATION ${ASCEND_DST_DIR})\n")
ExternalProject_Add(
${ASCEND_PROJECT}
${EXTERNAL_PROJECT_LOG_ARGS}
PREFIX ${ASCEND_SOURCE_DIR}
DOWNLOAD_DIR ${ASCEND_DOWNLOAD_DIR}
DOWNLOAD_COMMAND wget --no-check-certificate ${ASCEND_URL} -c -q -O ${ASCEND_NAME}.tar.gz
&& tar zxvf ${ASCEND_NAME}.tar.gz
DOWNLOAD_NO_PROGRESS 1
UPDATE_COMMAND ""
CMAKE_ARGS -DCMAKE_INSTALL_PREFIX=${ASCEND_INSTALL_ROOT}
CMAKE_CACHE_ARGS -DCMAKE_INSTALL_PREFIX:PATH=${ASCEND_INSTALL_ROOT}
)
ADD_LIBRARY(ascend SHARED IMPORTED GLOBAL)
SET_PROPERTY(TARGET ascend PROPERTY IMPORTED_LOCATION ${ASCEND_LIB})

ADD_LIBRARY(ascend_graph SHARED IMPORTED GLOBAL)
SET_PROPERTY(TARGET ascend_graph PROPERTY IMPORTED_LOCATION ${ASCEND_GRAPH_LIB})
ADD_DEPENDENCIES(ascend ascend_graph ${ASCEND_PROJECT})

#NOTE: Logic is from
# https://github.com/mindspore-ai/graphengine/blob/master/CMakeLists.txt
if(DEFINED ENV{ASCEND_CUSTOM_PATH})
set(ASCEND_DIR $ENV{ASCEND_CUSTOM_PATH})
else()
set(ASCEND_DIR /usr/local/Ascend)
endif()

if(WITH_ASCEND)
set(ASCEND_DRIVER_DIR ${ASCEND_DIR}/driver/lib64)
set(ASCEND_DRIVER_COMMON_DIR ${ASCEND_DIR}/driver/lib64/common)
set(ASCEND_DRIVER_SHARE_DIR ${ASCEND_DIR}/driver/lib64/share)
set(ASCEND_RUNTIME_DIR ${ASCEND_DIR}/fwkacllib/lib64)
set(ASCEND_ATC_DIR ${ASCEND_DIR}/atc/lib64)
set(ASCEND_ACL_DIR ${ASCEND_DIR}/acllib/lib64)
set(STATIC_ACL_LIB ${ASCEND_ACL_DIR})

set(ASCEND_MS_RUNTIME_PATH ${ASCEND_RUNTIME_DIR} ${ASCEND_ACL_DIR} ${ASCEND_ATC_DIR})
set(ASCEND_MS_DRIVER_PATH ${ASCEND_DRIVER_DIR} ${ASCEND_DRIVER_COMMON_DIR})
set(ATLAS_RUNTIME_DIR ${ASCEND_DIR}/ascend-toolkit/latest/fwkacllib/lib64)
set(ATLAS_RUNTIME_INC_DIR ${ASCEND_DIR}/ascend-toolkit/latest/fwkacllib/include)
set(ATLAS_ACL_DIR ${ASCEND_DIR}/ascend-toolkit/latest/acllib/lib64)
set(ATLAS_ATC_DIR ${ASCEND_DIR}/ascend-toolkit/latest/atc/lib64)
set(ATLAS_MS_RUNTIME_PATH ${ATLAS_RUNTIME_DIR} ${ATLAS_ACL_DIR} ${ATLAS_ATC_DIR})

set(atlas_graph_lib ${ATLAS_RUNTIME_DIR}/libgraph.so)
set(atlas_ge_runner_lib ${ATLAS_RUNTIME_DIR}/libge_runner.so)
set(atlas_acl_lib ${ATLAS_RUNTIME_DIR}/libascendcl.so)
INCLUDE_DIRECTORIES(${ATLAS_RUNTIME_INC_DIR})

if(EXISTS ${ATLAS_RUNTIME_INC_DIR}/graph/ascend_string.h)
add_definitions(-DPADDLE_WITH_ASCEND_STRING)
endif()

ADD_LIBRARY(ascend_ge SHARED IMPORTED GLOBAL)
SET_PROPERTY(TARGET ascend_ge PROPERTY IMPORTED_LOCATION ${atlas_ge_runner_lib})

ADD_LIBRARY(ascend_graph SHARED IMPORTED GLOBAL)
SET_PROPERTY(TARGET ascend_graph PROPERTY IMPORTED_LOCATION ${atlas_graph_lib})

ADD_LIBRARY(atlas_acl SHARED IMPORTED GLOBAL)
SET_PROPERTY(TARGET atlas_acl PROPERTY IMPORTED_LOCATION ${atlas_acl_lib})

add_custom_target(extern_ascend DEPENDS ascend_ge ascend_graph atlas_acl)
endif()

if(WITH_ASCEND_CL)
set(ASCEND_CL_DIR ${ASCEND_DIR}/ascend-toolkit/latest/fwkacllib/lib64)

set(ascendcl_lib ${ASCEND_CL_DIR}/libascendcl.so)
set(acl_op_compiler_lib ${ASCEND_CL_DIR}/libacl_op_compiler.so)
set(ASCEND_CL_INC_DIR ${ASCEND_DIR}/ascend-toolkit/latest/fwkacllib/include)

message(STATUS "ASCEND_CL_INC_DIR ${ASCEND_CL_INC_DIR}")
message(STATUS "ASCEND_CL_DIR ${ASCEND_CL_DIR}")
INCLUDE_DIRECTORIES(${ASCEND_CL_INC_DIR})

ADD_LIBRARY(ascendcl SHARED IMPORTED GLOBAL)
SET_PROPERTY(TARGET ascendcl PROPERTY IMPORTED_LOCATION ${ascendcl_lib})

ADD_LIBRARY(acl_op_compiler SHARED IMPORTED GLOBAL)
SET_PROPERTY(TARGET acl_op_compiler PROPERTY IMPORTED_LOCATION ${acl_op_compiler_lib})
add_custom_target(extern_ascend_cl DEPENDS ascendcl acl_op_compiler)

endif()
4 changes: 2 additions & 2 deletions cmake/external/brpc.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -39,9 +39,9 @@ set(prefix_path "${THIRD_PARTY_PATH}/install/gflags|${THIRD_PARTY_PATH}/install/
ExternalProject_Add(
extern_brpc
${EXTERNAL_PROJECT_LOG_ARGS}
# TODO(gongwb): change to de newst repo when they changed.
# TODO(gongwb): change to de newst repo when they changed
GIT_REPOSITORY "https://github.com/wangjiawei04/brpc"
GIT_TAG "6d79e0b17f25107c35b705ea58d888083f59ff47"
GIT_TAG "e203afb794caf027da0f1e0776443e7d20c0c28e"
PREFIX ${BRPC_SOURCES_DIR}
UPDATE_COMMAND ""
CMAKE_ARGS -DCMAKE_CXX_COMPILER=${CMAKE_CXX_COMPILER}
Expand Down
Loading