Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add StringTensor #39830

Merged
merged 65 commits into from
Mar 26, 2022
Merged

Add StringTensor #39830

merged 65 commits into from
Mar 26, 2022

Conversation

joey12300
Copy link
Contributor

@joey12300 joey12300 commented Feb 23, 2022

PR types

New features

PR changes

OPs

Describe

该PR仅修改C++层的代码,暂时不涉及Python API的修改

  1. 新增StringTensor C++层数据结构
  2. 新增一个字符串Kernel——大小写转换,支持CPU,GPU上运行
  3. 新增字符串C++ API的生成脚本,可以通过配置yaml生成字符串C++ API

TODO:

  1. 新增字符串动态图API,Python-C接口生成
  2. 补充StringTensor的Pybind接口,用于在Python层创建StringTensor。
  3. 补充更多字符串算子

Optimize the build size

背景

该PR Coverage CI生成的build目录的大小为140GB,增量为4GB,超过了该CI的阈值3GB,需要优化build目录的生成大小。

优化方法

  • 去除不必要的头文件,减少链接时引入不必要的目标文件,减少生成可执行文件的大小。

经过分析发现,在生成strings_api.cc的时候,引入了不必要的头文件paddle/phi/kernels/declarations.h,该头文件会把Paddle所有kernel的声明引入,并在链接时将其实现链接,形成一个超大的可执行文件。通过去除该头文件,在本地编译时,单测可执行文件test_strings_empty_apitest_strings_lower_upper_api的大小从241MB下降到10MB,与其他单测可执行文件大小相仿。而Coverage CI生成的build目录大小也从140GB下降到137GB,增量从4GB下降到1GB

@paddle-bot-old
Copy link

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

Conflicts:
	cmake/phi.cmake
	paddle/phi/CMakeLists.txt
	paddle/phi/api/CMakeLists.txt
	paddle/phi/api/lib/CMakeLists.txt
	paddle/phi/api/lib/utils/CMakeLists.txt
	paddle/phi/common/data_type.h
	paddle/phi/core/kernel_utils.h
	paddle/phi/tests/api/CMakeLists.txt
	paddle/phi/tests/kernels/CMakeLists.txt
wawltor
wawltor previously approved these changes Mar 22, 2022
Copy link
Contributor

@wawltor wawltor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

chenwhql
chenwhql previously approved these changes Mar 23, 2022
Copy link
Contributor

@chenwhql chenwhql left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -0,0 +1,112 @@
/* Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

细节,还有几个文件的license没有换行

@joey12300 joey12300 dismissed stale reviews from chenwhql and wawltor via 8d141eb March 23, 2022 06:11
XiaoguangHu01
XiaoguangHu01 previously approved these changes Mar 24, 2022
Copy link
Contributor

@XiaoguangHu01 XiaoguangHu01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@ZeyuChen ZeyuChen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ZeyuChen ZeyuChen merged commit 0695e1a into PaddlePaddle:develop Mar 26, 2022
chenwhql added a commit to chenwhql/Paddle that referenced this pull request Mar 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants