Skip to content

Conversation

@hushenwei2000
Copy link
Contributor

修复 variable_length_memory_efficient_attention API 的两处地方。Paddle 中该 API 的实现是调用 cutlass 的,所以认为它的实现正确,以修改 PaddleAPITest 框架为主。

  1. tester/api_config/config_analyzer.pymask 数组中的元素要么是 -inf 要么是 0,在计算时 attention 的结果加上 mask 再执行 softmax 即可完成 mask 掉某些元素的效果。
  2. tester/paddle_to_torch/rules.py:修复了 K 和 V 扩展 head_dim 的方法;修复了 mask 和 mask_fill 顺序,前者应该在前否则它会覆盖掉 mask_fill 的结果

@paddle-bot
Copy link

paddle-bot bot commented Aug 4, 2025

Thanks for your contribution!

Copy link
Collaborator

@cangtianhuang cangtianhuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@cangtianhuang
Copy link
Collaborator

done in #540

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants