Skip to content

Conversation

@ooooo-create
Copy link
Contributor

topk 返回的索引,相同元素的情况不保证是稳定的,更加关注前面的值

@paddle-bot
Copy link

paddle-bot bot commented May 30, 2025

Thanks for your contribution!

Copy link
Collaborator

@Cutelemon6 Cutelemon6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

有几个细节需要注意一下

elif self.dtype in {"int32", "int64"}:
self.numpy_tensor = numpy.random.choice(numpy.arange(-x_numel, x_numel), size=self.shape, replace=False).astype(self.dtype)
else:
raise ValueError(f"Unsupported dtype {self.dtype} for paddle.topk")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

f"Unsupported dtype {self.dtype} for paddle.topk / paddle.Tensor.topk"

Comment on lines +1675 to +1681
if self.dtype in {"bfloat16", "float32", "float64"}:
dtype = "float32" if self.dtype == "bfloat16" else self.dtype
self.numpy_tensor = numpy.linspace(-x_numel, x_numel, x_numel, dtype=dtype).reshape(self.shape)
if numpy.unique(self.numpy_tensor).size < x_numel:
self.numpy_tensor = generate_unique_array(x_numel, dtype).reshape(self.shape)
elif self.dtype == "float16":
self.numpy_tensor = generate_unique_array(x_numel, self.dtype).reshape(self.shape)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

非 float 16 经过 numpy.linspace 也会产生相同的元素吗(舍入错误),考虑将范围放在 dtype 的最大范围内再生成呢

numpy.linspace(numpy.finfo(dtype).min, numpy.finfo(dtype).max, x_numel, dtype=dtype).reshape(self.shape)

float16 也试试。看了下有一些 float16 配置 numel 很大,但是 float16 表示范围很有限。

Copy link
Collaborator

@Cutelemon6 Cutelemon6 Jun 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

upd:float16 最多表示 63488 个有限值,超出这个元素数量的 tensor 具有随机输出。

同学可以了解一下 float16 的表示范围,元素数量1亿以内的的 float16 tensor 理论上都能通过,没有随机性。

print(f"float16 max: {numpy.finfo(numpy.float16).max}, float16 min: {numpy.finfo(numpy.float16).min}, float16 eps: {numpy.finfo(numpy.float16).eps}")
print(f"max tensor numel est: {(numpy.finfo(numpy.float16).max.astype(numpy.float64) - numpy.finfo(numpy.float16).min.astype(numpy.float64)) / numpy.finfo(numpy.float16).eps}")

output

float16 max: 65504.0, float16 min: -65504.0, float16 eps: 0.0009765625
max tensor numel est: 134152192.0

@ooooo-create ooooo-create reopened this Jun 4, 2025
paddle.put_along_axis(Tensor([7, 8000],"float32"), Tensor([7, 799],"int64"), Tensor([7, 799],"float32"), 1, )
paddle.put_along_axis(Tensor([8, 8000],"float32"), Tensor([8, 799],"int64"), Tensor([8, 799],"float32"), 1, )
paddle.put_along_axis(Tensor([9, 8000],"float32"), Tensor([9, 799],"int64"), Tensor([9, 799],"float32"), 1, )
paddle.topk(Tensor([128, 1000],"float16"), k=5, ) No newline at end of file
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个 case 在 config_analyzer.py 中添加如下初始化,经过多次运行均可 pass

elif api_config.api_name == "paddle.topk":
    if self.check_arg(api_config, 0, "x"):
        self.numpy_tensor = numpy.linspace(numpy.finfo(self.dtype).min, numpy.finfo(self.dtype).max, num=self.numel()).astype(self.dtype).reshape(self.shape)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import numpy 
dtype = numpy.float16
numel = 128*1000
out  = numpy.linspace(numpy.finfo(dtype).min, numpy.finfo(dtype).max, num=numel).astype(dtype)
print(len(numpy.unique(out)))

打印出来实际上只有两个数呀,小数不是浮点数,浮点数是有限的(二进制位数是确定的),小数是无限的,浮点数有精度,两个小数间隔在精度范围之外不能用两个浮点数来表示,浮点数越大,相邻的间隔也会越大

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

确实嗷,是我没考虑全抱歉,float16 最多也只能表示六万多个不同的数了,对于这个 case 确实会表现随机性。

Comment on lines +1675 to +1681
if self.dtype in {"bfloat16", "float32", "float64"}:
dtype = "float32" if self.dtype == "bfloat16" else self.dtype
self.numpy_tensor = numpy.linspace(-x_numel, x_numel, x_numel, dtype=dtype).reshape(self.shape)
if numpy.unique(self.numpy_tensor).size < x_numel:
self.numpy_tensor = generate_unique_array(x_numel, dtype).reshape(self.shape)
elif self.dtype == "float16":
self.numpy_tensor = generate_unique_array(x_numel, self.dtype).reshape(self.shape)
Copy link
Collaborator

@Cutelemon6 Cutelemon6 Jun 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

upd:float16 最多表示 63488 个有限值,超出这个元素数量的 tensor 具有随机输出。

同学可以了解一下 float16 的表示范围,元素数量1亿以内的的 float16 tensor 理论上都能通过,没有随机性。

print(f"float16 max: {numpy.finfo(numpy.float16).max}, float16 min: {numpy.finfo(numpy.float16).min}, float16 eps: {numpy.finfo(numpy.float16).eps}")
print(f"max tensor numel est: {(numpy.finfo(numpy.float16).max.astype(numpy.float64) - numpy.finfo(numpy.float16).min.astype(numpy.float64)) / numpy.finfo(numpy.float16).eps}")

output

float16 max: 65504.0, float16 min: -65504.0, float16 eps: 0.0009765625
max tensor numel est: 134152192.0

@Cutelemon6
Copy link
Collaborator

所有 topk 的用例都通过了吗

@ooooo-create
Copy link
Contributor Author

所有 topk 的用例都通过了吗

是滴是滴,唯一没过的已经放在 random_calculation.txt 里面了

Copy link
Collaborator

@Cutelemon6 Cutelemon6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@wanghuancoder wanghuancoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wanghuancoder wanghuancoder merged commit 12d200d into PFCCLab:main Jun 5, 2025
@luotao1 luotao1 added the HappyOpenSource Pro 进阶版快乐开源活动,更具挑战性的任务 label Jun 10, 2025
@ooooo-create ooooo-create deleted the fix_topk branch September 29, 2025 09:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

HappyOpenSource Pro 进阶版快乐开源活动,更具挑战性的任务

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants