-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Sign CUDA Kernel #17293
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sign CUDA Kernel #17293
Conversation
|
If the percentage of the kernel time in the profile result is minor, I actually think that adding CUDA kernel of Sign is much simpler as it only requires several lines change in the unary elementwise, and it also helps ORT to run inference or forward graph with Sign on CUDA in the future... |
2ea9aa9 to
9220980
Compare
…baijumeswani/abs-grad
Makes sense. I was contemplating whether I should add the Sign cuda kernel or the AbsGrad cuda kernel initially. Made the change now to add the Sign cuda kernel |
…baijumeswani/abs-grad
…baijumeswani/abs-grad
|
Thank you for the review @er3x3 @hariharans29 |
Cherry-pick PRs: #18026 #17912 #17901 “2 lines added whitespace errors when cherry-picking" #17293 #17364 #17505 #17885 This PR contains all the cherry-picks for the patch release except: 1. The PRs marked with sdxl_llama 2. #17772 which has a merge conflict. --------- Co-authored-by: Chi Lo <[email protected]> Co-authored-by: Chi Lo <[email protected]> Co-authored-by: Scott McKay <[email protected]> Co-authored-by: Baiju Meswani <[email protected]> Co-authored-by: Kaz Nishimura <[email protected]> Co-authored-by: Scott McKay <[email protected]>
l1_loss is defined as:
mean(abs(y1 - y2))If y = abs(x), dy/dx = sign(x).
In onnxruntime,
Signdoes not have a cuda kernel. As a result, the execution graph looks like:MemcpyToHost -> Sign -> MemcpyFromHostThis PR implements the
Signcuda kernel so as to avoid the memcpy.