Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

【Hackathon No.25】为 Paddle 新增 nanquantile 数学计算API #41343

Merged
merged 20 commits into from
Apr 15, 2022
Merged

【Hackathon No.25】为 Paddle 新增 nanquantile 数学计算API #41343

merged 20 commits into from
Apr 15, 2022

Conversation

Asthestarsfalll
Copy link
Contributor

@Asthestarsfalll Asthestarsfalll commented Apr 2, 2022

PR types

Others

PR changes

APIs

Describe

解决了issue:#40308
增加了paddle.nanquantile。其是paddle.quantile的变体,即沿给定的轴计算非nan元素的分位数。
设计文档:PaddlePaddle/community#55

修改了paddle.quantile的代码,使其与paddle.nanquantile共用绝大部分代码,所有计算逻辑都在函数_compute_quantile中。
修复了quantile不能对含NaN的输入计算正确结果的问题,修复了quantile单测代码中静态图失败的问题。
修复了quantile不能与numpy结果高精度对齐,并且在输出格式上与numpy对齐,对于任意数据类型的输入,结果都是float64

@paddle-bot-old paddle-bot-old bot added contributor External developers status: proposed labels Apr 2, 2022
@paddle-bot-old
Copy link

paddle-bot-old bot commented Apr 2, 2022

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@yang131313
Copy link
Contributor

PR-CI-Coverage没过说明代码覆盖率不够,这个CI是需要通过的噢~

@Asthestarsfalll
Copy link
Contributor Author

@yang131313 你好,日志上显示超时,请问该如何处理呢。这个api本身需要覆盖的情况是比较多的,可以设置超时时间吗?

@paddle-bot-old
Copy link

paddle-bot-old bot commented Apr 6, 2022

PR格式检查通过,你的PR将接受Paddle专家以及开源社区的review,请及时关注PR动态。
The format inspection passed. Your PR will be reviewed by experts of Paddle and developers from the open-source community. Stay tuned.

@luotao1
Copy link
Contributor

luotao1 commented Apr 6, 2022

@Asthestarsfalll 可以试试把输入数据变小一点来通过单测


import numpy as np
import paddle
x = np.random.randn(2, 3)
Copy link
Contributor

@luotao1 luotao1 Apr 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

建议示例不要用随机数,用一个固定构造的数组。2*3矩阵里的元素可以简单便于手动计算:

  • quantile和nanquantile更容易让读者发现两者的区别
  • 对于插值计算也容易理解

def nanquantile(x, q, axis=None, keepdim=False):
"""
Compute the quantile of the input as if NaN values in input did not exist.
If all values in a reduced row are NaN then the quantiles for that reduction will be NaN.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If all values in a reduced row are NaN
后面加逗号

需要补充全NAN的示例

def quantile(x, q, axis=None, keepdim=False):
"""
Compute the quantile of the input along the specified axis.
If any values in a reduced row are NaN then the quantiles for that reduction will be NaN.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If any values in a reduced row are NaN
后面加逗号

x = paddle.to_tensor(input_data)
paddle_res = paddle.nanquantile(x, q=0.35, axis=0)
np_res = np.nanquantile(x, q=0.35, axis=0)
self.assertTrue(np.allclose(paddle_res.numpy(), np_res, equal_nan=True))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

单测和quantile几乎一摸一样,但nanquantile更要侧重对NAN的测试:

  • 可以修改一下已有单测,每个Class里增加不同位置NAN的测试。包括一个或者多个NAN
  • 缺少全是NAN的测试

因为两份单测非常类似,如果可以的话,看如何更好地进行复用(非强制要求),如

test_case(self.x)
test_case(self.x, [])
test_case(self.x, -1)
test_case(self.x, keepdim=True)
test_case(self.x, 2, keepdim=True)
test_case(self.x, [0, 2])
test_case(self.x, (0, 2))
test_case(self.x, [0, 1, 2, 3])
paddle.enable_static()

_test_static_graph('amax')
_test_static_graph('amin')
_test_static_graph('max')
_test_static_graph('min')

@Asthestarsfalll
Copy link
Contributor Author

@luotao1 你好,已修改。

x, q=[0.1, 0.2, 0.3], axis=[1, 2], keepdim=True)
np_res = np.quantile(
self.input_data, q=[0.1, 0.2, 0.3], axis=[1, 2], keepdims=True)
self.assertTrue(np.allclose(paddle_res.numpy(), np_res))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

99-144行的单测为什么不用API_list呢?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

数值正确性在TestQuantileAndNanquantile中进行测试,我认为TestMuitlpleQ更多的是想验证当输入多个q时结果是否正常,而quantile和nanquantile在计算方式上仅有些微差别(我相信TestQuantileAndNanquantile能覆盖绝大多数情况),对于多个q的情况,也只是将单次的结果加入列表中,输出时stack起来。另一方面也是防止测试太多超时(还是超时了),因此我认为只使用quantile是足够的(或者改为quantile和nanquantile交替使用)。

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的

@luotao1
Copy link
Contributor

luotao1 commented Apr 11, 2022

另一方面也是防止测试太多超时(还是超时了)

超时的话,1)可以试试把测试数据变小, 2)分成两个单测文件

@PaddlePaddle PaddlePaddle locked and limited conversation to collaborators Apr 13, 2022
@PaddlePaddle PaddlePaddle unlocked this conversation Apr 13, 2022
@luotao1 luotao1 closed this Apr 14, 2022
@luotao1 luotao1 reopened this Apr 14, 2022
luotao1
luotao1 previously approved these changes Apr 14, 2022
Copy link
Contributor

@luotao1 luotao1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -445,6 +446,7 @@
'numel',
'median',
'quantile',
'nanquantile'
Copy link
Contributor

@luotao1 luotao1 Apr 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

又少了一个逗号了。。
可以通过看日志发现:https://xly.bce.baidu.com/paddlepaddle/paddle/newipipe/detail/5392118/job/14041205


2022-04-14 01:51:31 There are 3 approved errors.
2022-04-14 01:51:31 ****************
2022-04-14 01:51:32 API Difference is: 
2022-04-14 01:51:32 - paddle.Tensor.is_complex (ArgSpec(args=['x'], varargs=None, varkw=None, defaults=None, kwonlyargs=[], kwonlydefaults=None, annotations={}), ('document', '9d4dc47b098ce34e65cc23e14ad02281'))
2022-04-14 01:51:32 - paddle.Tensor.quantile (ArgSpec(args=['x', 'q', 'axis', 'keepdim'], varargs=None, varkw=None, defaults=(None, False), kwonlyargs=[], kwonlydefaults=None, annotations={}), ('document', 'd6e25fbeb7751f8e57ad215209f36e00'))
2022-04-14 01:51:32 ?                                                                                                                                                                                          ^ ^^^^^^^   ^^^^^ ^^^^^^^ ^^^^^^^^^

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

抱歉,已修改。当时看的时候还以为是__all__里的。。

@luotao1 luotao1 requested a review from DDDivano April 15, 2022 02:17
@dingjiaweiww dingjiaweiww self-requested a review April 15, 2022 02:18
@Asthestarsfalll
Copy link
Contributor Author

实在抱歉,还是有些纰漏。之前发现时未及时修改,后面事情太多忘记了..

@luotao1
Copy link
Contributor

luotao1 commented Apr 15, 2022

实在抱歉,还是有些纰漏。之前发现时未及时修改,后面事情太多忘记了

是有什么问题呢?文档还是哪块?

@Asthestarsfalll
Copy link
Contributor Author

Asthestarsfalll commented Apr 15, 2022

文档还是哪块?

文档部分

@luotao1
Copy link
Contributor

luotao1 commented Apr 15, 2022

文档部分

文档部分下一个PR修吧,这个PR涉及approve的同学比较多,先合入主体功能

@Asthestarsfalll
Copy link
Contributor Author

文档部分下一个PR修吧,这个PR涉及approve的同学比较多,先合入主体功能

收到(。>ㅿ<。)

@luotao1 luotao1 merged commit b9ee6a2 into PaddlePaddle:develop Apr 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contributor External developers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants