Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

【PaddlePaddle Hackathon 2】3、为 Paddle 新增 corrcoef(皮尔逊积矩相关系数) API #40690

Merged
merged 57 commits into from
May 9, 2022

Conversation

liqitong-a
Copy link
Contributor

@liqitong-a liqitong-a commented Mar 17, 2022

PR types

Others

PR changes

APIs

Describe

ISSUE链接:corrcoef
RFC的PR链接:PaddlePaddle/community#46
中文文档链接:PaddlePaddle/docs#4316

@Ligoml
Copy link
Contributor

Ligoml commented Mar 18, 2022

hi~有一些细节需要注意一下 @liqitong-a
1、PR的标题要和ISSUE标题内容一样;
2、Describe 里要加上 ISSUE 链接 & RFC链接 & 中文文档链接

@liqitong-a liqitong-a changed the title 【Hackathon No.3】corrcoef 【PaddlePaddle Hackathon 2】3、为 Paddle 新增 corrcoef(皮尔逊积矩相关系数) API Mar 18, 2022
@paddle-bot-old
Copy link

PR格式检查通过,你的PR将接受Paddle专家以及开源社区的review,请及时关注PR动态。
The format inspection passed. Your PR will be reviewed by experts of Paddle and developers from the open-source community. Stay tuned.

@liqitong-a
Copy link
Contributor Author

hi~有一些细节需要注意一下 @liqitong-a 1、PR的标题要和ISSUE标题内容一样; 2、Describe 里要加上 ISSUE 链接 & RFC链接 & 中文文档链接

好的好的 改好啦


def corrcoef(x, rowvar=True, ddof=False, name=None):
"""
Return Pearson product-moment correlation coefficients.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

文档不要照抄numpy吧,自己尝试写一下能不能更清楚一些


x(Tensor): A N-D(N<=2) Tensor containing multiple variables and observations. By default, each row of x represents a variable. Also see rowvar below.
rowvar(Bool, optional): If rowvar is True (default), then each row represents a variable, with observations in the columns. Default: True
ddof(Bool, optional): Has no effect, do not use.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果已经deprecated,那为什么还要保留?


"""

if ddof is not False:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上

warnings.warn('ddof have no effect and are deprecated',
DeprecationWarning)
c = cov(x, rowvar)
try:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

代码中最好分类处理,不要用try,否则有其他错误也不容易发现

self.shape = [20, 10]

def test_tensor_corr_default(self):
typelist = ['float64']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果支持复数,请添加复数类型测试

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果支持复数,请添加复数类型测试

我想问一下,numpy里的cov支持复数,paddle里的cov不支持复数,corrcoef是再cov的基础上写的,paddle的corrcoef需要支持复数吗,如果需要的话,那cov是需要改成支持复数的吗

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我去看了一下,基础操作现在复数支持还不完善,那可以先不添加复数测试了。

Copy link

@shjNT shjNT Apr 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typelist是否需要补充fp32,

@paddle-bot-old
Copy link

你的PR有最新反馈,请及时修改。
There’s the latest feedback about your PR. Please check.

@@ -0,0 +1,116 @@
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2019->2022

self.assertRaises(ValueError, test_err)


class Corr_Test4(unittest.TestCase):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

每个test类添加一下注释,
增加不支持的数据类型测试案例

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

每个test类添加一下注释, 增加不支持的数据类型测试案例

我改好啦,麻烦看一下哦

tensor = paddle.to_tensor(np_arr, place=p)
corr = paddle.linalg.corrcoef(tensor)
np_corr = numpy_corr(np_arr, rowvar=True, dtype=dtype)
self.assertTrue(np.allclose(np_corr, corr.numpy(), atol=1.e-2))
Copy link
Contributor

@zhwesky2010 zhwesky2010 Apr 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

建议float64和float32分开写单测,两者精度差距比较大 还是分开设置阈值好些,float64不要改rtol,float32改成rtol=1e-4或-5看过不过得了CI

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我试了一下,本地测改成float32改成-5是可以的,提交时候测就过不了

Copy link
Contributor

@zhwesky2010 zhwesky2010 Apr 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我试了一下,本地测改成float32改成-5是可以的,提交时候测就过不了

按道理是不会的,可能是你本地只是某种环境,但是CI会测试 Linux/mac/windows下mkl/openblas/GPU库的多种组合,某种case下过不了。具体挂的是哪条测试呢

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我试了一下,本地测改成float32改成-5是可以的,提交时候测就过不了

按道理是不会的,可能是你本地只是某种环境,但是CI会测试 Linux/mac/windows下mkl/openblas/GPU库的多种组合,某种case下过不了。具体挂的是哪条测试呢

挂的是这个 self.assertTrue(np.allclose(np_corr, corr.numpy(), atol=1.e-2)) ,当使用float32的时候

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我试了一下,本地测改成float32改成-5是可以的,提交时候测就过不了

按道理是不会的,可能是你本地只是某种环境,但是CI会测试 Linux/mac/windows下mkl/openblas/GPU库的多种组合,某种case下过不了。具体挂的是哪条测试呢

挂的是这个 self.assertTrue(np.allclose(np_corr, corr.numpy(), atol=1.e-2)) ,当使用float32的时候

我是说挂的哪条流水线

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果1e-5不行的话,1e-4也行吧,麻烦重新提交下

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我试了一下,本地测改成float32改成-5是可以的,提交时候测就过不了

按道理是不会的,可能是你本地只是某种环境,但是CI会测试 Linux/mac/windows下mkl/openblas/GPU库的多种组合,某种case下过不了。具体挂的是哪条测试呢

挂的是这个 self.assertTrue(np.allclose(np_corr, corr.numpy(), atol=1.e-2)) ,当使用float32的时候

我是说挂的哪条流水线

挂的是PR-CI-Static-Check

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果1e-5不行的话,1e-4也行吧,麻烦重新提交下

我重新试一下

self.shape = [20, 10]

def test_tensor_corr_default(self):
typelist = ['float64', 'float32']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

还是过不了,float32精度是要差一些,就测float64精度吧,你找找最小的atol

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

还是过不了,float32精度是要差一些,就测float64精度吧,你找找最小的atol

好的好的 我调整一下

Copy link
Contributor

@zhwesky2010 zhwesky2010 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

你好,麻烦float64不要设置atol,float32设置1e-5,分开测再提交一下

@liqitong-a
Copy link
Contributor Author

你好,麻烦float64不要设置atol,float32设置1e-5,分开测再提交一下

float64我测试过的 是可以过的,不过float32设置1e-3一下都不行的。

@zhwesky2010
Copy link
Contributor

zhwesky2010 commented Apr 26, 2022

你好,麻烦float64不要设置atol,float32设置1e-5,分开测再提交一下

float64我测试过的 是可以过的,不过float32设置1e-3一下都不行的。

你按这个标准来提交就可以,后面我们会处理

@liqitong-a
Copy link
Contributor Author

你好,麻烦float64不要设置atol,float32设置1e-5,分开测再提交一下

float64我测试过的 是可以过的,不过float32设置1e-3一下都不行的。

你按这个标准来提交就可以,后面我们会处理

请问是把float32和float64放两个文件测吗

@zhwesky2010
Copy link
Contributor

你好,麻烦float64不要设置atol,float32设置1e-5,分开测再提交一下

float64我测试过的 是可以过的,不过float32设置1e-3一下都不行的。

你按这个标准来提交就可以,后面我们会处理

请问是把float32和float64放两个文件测吗

就是分两个 assertTrue

@zhwesky2010
Copy link
Contributor

@liqitong-a 代码风格检查没有过
infoflow 2022-04-27 19-27-32

@liqitong-a
Copy link
Contributor Author

@liqitong-a 代码风格检查没有过 infoflow 2022-04-27 19-27-32

这样子可以吗

Ligoml
Ligoml previously approved these changes May 5, 2022
Copy link
Contributor

@Ligoml Ligoml left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for docs

@luotao1
Copy link
Contributor

luotao1 commented May 5, 2022

从PR-CI-Coverage的历史记录看 https://xly.bce.baidu.com/paddlepaddle/paddle/newipipe/builds/6512?module=github/PaddlePaddle/Paddle&pipeline=PR-CI-Coverage&branch=pull/40690(develop) 单测存在超时情况,可以考虑缩小输入数据维度来避免

@liqitong-a
Copy link
Contributor Author

从PR-CI-Coverage的历史记录看 https://xly.bce.baidu.com/paddlepaddle/paddle/newipipe/builds/6512?module=github/PaddlePaddle/Paddle&pipeline=PR-CI-Coverage&branch=pull/40690(develop) 单测存在超时情况,可以考虑缩小输入数据维度来避免

我改了一下,可以了

@@ -3181,3 +3182,72 @@ def lstsq(x, y, rcond=None, driver=None, name=None):
singular_values = paddle.static.data(name='singular_values', shape=[0])

return solution, residuals, rank, singular_values


def corrcoef(x, rowvar=True, name=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

接口跟numpy比较,缺少了y参数,缺少的原因是什么?后续会添加y参数吗?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

接口跟numpy比较,缺少了y参数,缺少的原因是什么?后续会添加y参数吗?

这个是由于计算corrcoef首先要计算cov,paddle的cov在编写的时候对比numpy也是没有y参数,后续需要看cov是否添加y参数。

Copy link
Contributor

@zhiboniu zhiboniu May 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

源码:
3c06f698d853ffbf64e4431cfaf0b432
示例:
image

这个原因在于,y是对x元素的补充,最终使用的时候也是跟x拼接起来了,所以跟只用一个拼接好的x是一样的。
从简洁性上考虑,这个y也有点多余。所以以后应该不会添加y参数的。

Copy link
Contributor

@XiaoguangHu01 XiaoguangHu01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG API

@jeff41404 jeff41404 merged commit 95a502a into PaddlePaddle:develop May 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contributor External developers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants