[Refactor] Small refactor for group topk#30562
Conversation
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
There was a problem hiding this comment.
Code Review
This pull request introduces a couple of refactorings in the grouped_topk CUDA kernel. The apply_scoring function is made more robust by explicitly handling SCORING_NONE and adding a static_assert for unsupported scoring functions, which is a good improvement. The main change is a performance optimization in group_idx_and_topk_idx_kernel where a division is moved out of a loop. While this improves performance, I've raised a concern about the change in the order of floating-point operations, which could potentially affect numerical precision and determinism. The removal of a commented-out line in a test file is a minor but good cleanup.
Signed-off-by: yewentao256 <zhyanwentao@126.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Signed-off-by: Ubuntu <mjtaheri68@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Purpose
Thanks for comments from @mgoin #30159 (review) and #29125 (review).
This PR fixes this.
The division could give us some performance benefits as well
Test