-
Notifications
You must be signed in to change notification settings - Fork 5.9k
【Hackathon 9th No.109】[CppExtension] Support build Custom OP in setuptools 80+ and support install extension via pip install . --no-build-isolation
#76008
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
你的PR提交成功,感谢你对开源项目的贡献! |
Codecov Report❌ Patch coverage is
❌ Your patch status has failed because the patch coverage (7.75%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## develop #76008 +/- ##
==========================================
Coverage ? 7.75%
==========================================
Files ? 1
Lines ? 116
Branches ? 0
==========================================
Hits ? 9
Misses ? 107
Partials ? 0 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
SigureMo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
顺师傅牛啊,我找时间看看
呃 ... ... 这道题你的啊 ~ 🤣 我还没准备好提交 review ~ 🤭 虽然测试过了,不过,应该还是有问题 ~ 整体逻辑是先 custom_write_stub,然后把包整理整理放到对应的目录中 ~ 我尝试在本地 install,它把包构建在当前目录,但没有把包移到 /usr/local/lib/python3.9/dist-packages ,所以 pip list 是看不到的,这里应该有问题 ~ 之所有测试能过,也是因为包在当前目录 ~ 而在其他目录 import 这个 module 会出错 ~ 第一次 commit 的时候,import 是没问题的,但是 CI 错了 ~ GPT-5 的定位: 我再看看具体要怎么解决 ~ 🫠 |
Update 20251028修改如下:
以下是测试日志: 看看有啥要注意的不? |
SigureMo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
在 PR 描述里先说明下思路吧,这个题目是要求 RFC 的
| x for x in os.listdir(site_dir) if 'custom_cpp_extension' in x | ||
| ] | ||
| assert len(custom_egg_path) == 1, ( | ||
| assert len(custom_egg_path) == 2, ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
setuptools 在 80.9.0 和 57.1.0 中测试通过。不太清楚为什么之前的测试用例这里是 1 。
奇怪,这些单测在 CI 中是没跑么?
| # Setting metadata_version >= 2.1 (introduced in PEP 566) forces setuptools | ||
| # to create .dist-info directories instead of .egg-info, which allows pip | ||
| # to properly detect and list installed packages via `pip list`. | ||
| # Version 2.1 is sufficient for this purpose and maintains compatibility. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
➜ cpp_extension git:(setuptools80) ✗ ls /usr/local/lib/python3.9/dist-packages | grep mix_relu
mix_relu_extension
mix_relu_extension-0.0.0-py3.9.egg-info
可是这里看还是 egg-info 而不是 dist-info 呀
| # Write stub; it will reference the _pd_ renamed resource at import time | ||
| custom_write_stub(so_name, pyfile) | ||
| except Exception as e: | ||
| print(f"Warning: failed to generate python api file: {e}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里应该是 warning 么?不应该直接报错么?
| def finalize_options(self) -> None: | ||
| super().finalize_options() | ||
| try: | ||
| import sys |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import 放在文件开始
| try: | ||
| candidates.extend(site.getsitepackages()) | ||
| except Exception: | ||
| pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这些 try-except 有必要么?感觉满满的 AI 风格
Update 20251103旧代码不能使用
rfc : PaddlePaddle/community#1174 测试验证过程在 rfc 中有写 ~
旧的测试没问题,我之前改的时候没注意,实际上已经是产生两个目录了。
p.s. 这两天为了兼容 setuptools 的版本绕了很多弯路,后来发现,根本不需要考虑 setuptools 的版本兼容问题,把 wheel 的安装修复就可以了 ... ... 旧代码当时应该没有考虑 wheel 的安装,所以用的覆盖 bdist_egg.write_stub 的方式,目前看来完全没有必要 ... ... |
|
本地基于 demo case 测了下目测没啥问题,我找时间看看具体实现~ |
|
这周末 review! |
SigureMo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
整体我觉得 ok,走不到的逻辑,该清理的我觉得可以清理,大不了 revert
等会或者周一我在 FastDeploy 这种重度依赖自定义算子的场景再测一下,如果没问题的话我觉得就可以直接合了
| # to properly detect and list installed packages via `pip list`. | ||
| # Version 2.1 is sufficient for this purpose and maintains compatibility. | ||
| if 'metadata_version' not in attr: | ||
| attr['metadata_version'] = '2.1' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里我有点担心的是,最新 metadata version 已经到了 2.51,2.4 也已经基本是普及的状态,这里我们写死 2.1 会不会反而是阻碍现代化 metadata?
话说这里说的生成 .egg-info 是 2.1 之下的行为还是 2.1 之上的行为(按我理解是前者)?
Footnotes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
现在也说不好这个设置会对哪个版本的 setuptools 起作用,我理解的是,建议最低版本是 2.1 ,所以这里写了个判断,如果没有这个参数再进行设置,主要是针对旧的 setuptools 起作用 ~ 实在不行,就把这部分先删掉?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
嗯 可以删掉后在低版本测试下,如果没问题(能正常 import),.egg-info 还是 .dist-info 均可
| 3) rename the compiled library to *_pd_.so to avoid shadowing the python stub | ||
| Note: This is primarily for legacy 'python setup.py install' usage. | ||
| For modern 'pip install', the BdistWheelCommand handles file layout. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BdistWheelCommand 是不需要 extend 一些行为的是么?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不需要了 ~ 这里是当时针对不同 setuptools 版本进行调试的时候遗留下来的注释 ~ 我把这部分注释删了吧 ~
pip install . --no-build-isolation
|
FastDeploy 上好像还有些问题 这里两行可以把后缀删掉以屏蔽目录结构的差异(后续可能需要加一个 if 判断两种目录存在哪个),但是修改后不知道为什么会把 stub 生成到源码里
@megemini 顺师傅有时间可以看下这个么?这个不需要保证目录结构什么的完全不变,构建脚本可以调整 |
有木有测试用例或者测试方法?我这里之前 fd 安装不了,cuda 版本太低 ... ... |
我觉得可以关注下使用方式 这里使用了 以及这里进行了 可以按照这个构造一个 case? |
考虑到 FD 这种依赖目录结构的场景,可能其他下游生态库也有类似使用,周一我问问其他同学这里改动的影响面,暂时可以先不清理 如果影响面实在不可控,可能还是得低版本(79-)用旧逻辑、高版本(80+)用新逻辑这样(按我理解需要在新增代码前面加上 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
啊 好快的删,没事我周一问问其他人再确定,反正就是一个 revert
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
昨天忘记说了,昨天和 @zyfncg 讨论的结论是先不为 79-、80+ 单独添加兼容性逻辑,单独解决一下 FD 里的问题就可以了,解决完 FD 问题后如果其他场景有问题反馈再为 79-、80+ 单独添加兼容性逻辑
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
有个地方没想明白,fd 为啥要把安装好的包 copy 到 fd 目录里面?方便后续一并卸载?应该也不是啊,它用的是 cp 而不是 mv ... ...
fd 如果要改的话,把现在生成的两个目录一起 copy 过去应该就可以了 ~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
按我理解是,这些自定义算子 fastdeploy_ops 只是中间产物(仅算子实现),最终会被打包到 fastdeploy 包(含 Python 实现)里进行发布
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
相当于,将这些算子统一放到 fd 的命名空间中进行管理,而不是作为一个个单独的包?
OK ~ 那咱们这里还需要做啥?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fastdeploy/model_executor/ops/gpu/init.py 里的内容变成了 stub 内容
这个?现在是需要咱们这里修改兼容 fd 目前的实现方式?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个?现在是需要咱们这里修改兼容 fd 目前的实现方式?
不一定是框架兼容,可以修改 FD 的 build 脚本
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
嗯 ~ 我的意思也是,咱们这个 pr 应该不需要修改,最好是修改 fd 那边的脚本 ~ 那我给 fd 那边提个 pr ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
嗯,如果确定不是这个 PR 导致的 bug,可以提 PR 改动下,不过那边的 PR 应该兼容两种方式,因为不能保证所有人都用 develop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--install-lib 之前是有问题的,前两天最新的 commit ba9349b 已经改了 ~
fd 那边我这两天提个 pr 改一下 ~
| so_name = os.path.basename(so_path) | ||
| build_dir = os.path.dirname(so_path) | ||
| # The package name equals distribution name | ||
| pkg_name = self.distribution.get_name() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
话说这里的路径可以从 extension.name 获取么?不与包名强绑定,比如 self.get_ext_fullpath(self.extensions[0].name) 这种?因为后续可能会考虑支持 这里描述有点问题,看下面~extension.name = pkg.submodule.ops 这种形式,虽然现在的话 extension.name 应该被 distribution.name 覆盖掉了
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
具体使用场景是这样的~
# setup.py
from paddle.utils.cpp_extension import CppExtension, setup
setup(
name='custom_setup_ops.my_ops.custom_relu',
ext_modules=CppExtension(
sources=['my_ops/relu_cpu.cc']
)
)# pyproject.toml
[project]
name = "custom-setup-ops" # distribution.name 可能拿到 custom-setup-ops 这种没有 normalize 的 name
version = "0.0.0"
[tool.setuptools.packages.find]
exclude = []
[build-system]
requires = ["setuptools>=61.0.0,<80.0.0"]
build-backend = "setuptools.build_meta"// my_ops/relu_cpu.cc
#include "paddle/extension.h"
#include <vector>
#define CHECK_INPUT(x) PD_CHECK(x.is_cpu(), #x " must be a CPU Tensor.")
template <typename data_t>
void relu_cpu_forward_kernel(const data_t* x_data,
data_t* out_data,
int64_t x_numel) {
for (int64_t i = 0; i < x_numel; ++i) {
out_data[i] = std::max(static_cast<data_t>(0.), x_data[i]);
}
}
std::vector<paddle::Tensor> ReluCPUForward(const paddle::Tensor& x) {
CHECK_INPUT(x);
auto out = paddle::empty_like(x);
PD_DISPATCH_FLOATING_TYPES(
x.type(), "relu_cpu_forward_kernel", ([&] {
relu_cpu_forward_kernel<data_t>(
x.data<data_t>(), out.data<data_t>(), x.numel());
}));
return {out};
}
// 维度推导
std::vector<std::vector<int64_t>> ReluInferShape(std::vector<int64_t> x_shape) {
return {x_shape};
}
// 类型推导
std::vector<paddle::DataType> ReluInferDtype(paddle::DataType x_dtype) {
return {x_dtype};
}
PD_BUILD_OP(custom_relu)
.Inputs({"X"})
.Outputs({"Out"})
.SetKernelFn(PD_KERNEL(ReluCPUForward))
.SetInferShapeFn(PD_INFER_SHAPE(ReluInferShape))
.SetInferDtypeFn(PD_INFER_DTYPE(ReluInferDtype));pyproject.toml 用于 uv build,python setup.py bdist_wheel 同样可以复现~
目前打包出来的目录结构如下
dist/custom_setup_ops/my_ops
├── custom_relu.so # so 没有 rename
├── custom_setup_ops.py # 生成的 stub name 应该是 ext name split(".")[-1] 而不是 distribution name
└── version.txt
预期应该是
dist/custom_setup_ops/my_ops
├── custom_relu_pd_.so
├── custom_relu.py
└── version.txt
另外注意这里可能需要一并调整下~
def custom_write_stub(resource, pyfile):
...
with open(pyfile, 'w') as f:
f.write(
_stub_template.format(
resource=os.path.basename(resource), # 这里需要删除前缀
custom_api='\n\n'.join(api_content)
)
)我们有假设 assert self.extensions == 1,可以直接取 self.extensions[0],不需要考虑多个 extensions 的 case,按我理解是微调,不需要大改
虽然这里不在 Setuptools 80+ 支持的考虑范围内,不过这里的调整看起来是泛用性支持,辛苦顺师傅再稍微调整下~
cc. @zhangbo9674
| ext_name = ( | ||
| self.extensions[0].name | ||
| if self.extensions | ||
| else self.distribution.get_name() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里 self.distribution.get_name() 还有使用场景嘛?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extensions 正常构建后,理论上应该不会用到 ~ 那么 _get_extension_name 里面也不用做判断了?
def _get_extension_name(self) -> str:
"""
Get the extension name from the extension module, not the distribution name.
This ensures we use the correct package name from setup.py.
Note: This assumes there is only one extension module (len(ext_modules) == 1).
Returns:
str: The extension name
"""
ext_name = None
if (
hasattr(self.distribution, 'ext_modules')
and self.distribution.ext_modules
):
ext_name = self.distribution.ext_modules[0].name
# If no extension name is found, fall back to distribution name
if ext_name is None:
ext_name = self.distribution.get_name()
return ext_name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
嗯,可以清理下~
| filename, ext = os.path.splitext(resource) | ||
| resource = filename + "_pd_" + ext | ||
| if not filename.endswith("_pd_"): | ||
| resource = filename + "_pd_" + ext |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里具体是什么 case 呀?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
之前在想,是否可以把重命名的过程放到 BuildExtension 中,这样的话,这里传进来的就是带有 pd 的 ~ 不过,现在用不到 ~ 删掉?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
嗯嗯 现在可以先删掉吧~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我这边没其他问题了,FD 验证通过,@zhangbo9674 也在 PaddleFleet 里验证过多级包名的 case 了,后面可以走合入流程了~
SigureMo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
自定义算子模块为独立进程,coverage 测不到,相关单测均已通过,下游生态测试 FD、PaddleFleet 均已测试,Coverage 中的覆盖率部分应当豁免 |
|
呃 ... ... PaddlePaddle/FastDeploy#4998 我这里还有点疑问 ~
合入先等等 ~~~ |
…tools 80+ and support install extension via `pip install . --no-build-isolation` (PaddlePaddle#76008)
|
@megemini 顺师傅,FD 那边 CI/CE 都挂了,辛苦有时间再在 CI 上看一下呢?https://github.com/PaddlePaddle/FastDeploy/actions/runs/19521463241/job/55886058007?pr=5123 目前 QA 暂时 pin 了下 PaddlePaddle 版本 PaddlePaddle/FastDeploy#5136,看看下周一之前能不能修复好,修不好的话我们做一下兼容性策略吧~ |
PR Category
Environment Adaptation
PR Types
Improvements
Description
本 PR 修复了 setuptools 80+ 下自定义算子安装包结构不合理的问题,同时支持了
pip install . --no-build-isolation的现代化安装方式(python setup.py xxx方式已经被 setuptools 弃用很久了)本 PR 不影响之前自定义算子的书写、构建安装、使用方式,对于用户应当是透明的,对安装后的包目录结构有少许变动,但对于 Python 加载机制来说是一致的
修改前,setuptools 79:
修改后,无论 setuptools 79 还是 80 都生成如下结构的文件
另外本 PR 新增的
pip install . --no-build-isolation安装方式则是生成如下结构的文件Related links