[Accuracy diff No.96] Fix accuracy diff for paddle.incubate.nn.functional.variable_length_memory_efficient_attention API #498
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
修复
variable_length_memory_efficient_attentionAPI 的两处地方。Paddle 中该 API 的实现是调用 cutlass 的,所以认为它的实现正确,以修改 PaddleAPITest 框架为主。tester/api_config/config_analyzer.py:mask数组中的元素要么是 -inf 要么是 0,在计算时 attention 的结果加上 mask 再执行 softmax 即可完成 mask 掉某些元素的效果。tester/paddle_to_torch/rules.py:修复了 K 和 V 扩展 head_dim 的方法;修复了 mask 和 mask_fill 顺序,前者应该在前否则它会覆盖掉 mask_fill 的结果