Skip to content

Conversation

@baijumeswani
Copy link
Contributor

@baijumeswani baijumeswani commented Aug 25, 2023

l1_loss is defined as: mean(abs(y1 - y2))

If y = abs(x), dy/dx = sign(x).

In onnxruntime, Sign does not have a cuda kernel. As a result, the execution graph looks like: MemcpyToHost -> Sign -> MemcpyFromHost

image

This PR implements the Sign cuda kernel so as to avoid the memcpy.

@baijumeswani baijumeswani added the training issues related to ONNX Runtime training; typically submitted using template label Aug 25, 2023
@Lafi7e
Copy link
Contributor

Lafi7e commented Aug 25, 2023

If the percentage of the kernel time in the profile result is minor, I actually think that adding CUDA kernel of Sign is much simpler as it only requires several lines change in the unary elementwise, and it also helps ORT to run inference or forward graph with Sign on CUDA in the future...

@baijumeswani baijumeswani force-pushed the baijumeswani/abs-grad branch from 2ea9aa9 to 9220980 Compare August 25, 2023 21:15
@baijumeswani
Copy link
Contributor Author

If the percentage of the kernel time in the profile result is minor, I actually think that adding CUDA kernel of Sign is much simpler as it only requires several lines change in the unary elementwise, and it also helps ORT to run inference or forward graph with Sign on CUDA in the future...

Makes sense. I was contemplating whether I should add the Sign cuda kernel or the AbsGrad cuda kernel initially.

Made the change now to add the Sign cuda kernel

@baijumeswani baijumeswani changed the title AbsGrad CPU and CUDA Kernels Sign CUDA Kernel Aug 25, 2023
Lafi7e
Lafi7e previously approved these changes Aug 28, 2023
hariharans29
hariharans29 previously approved these changes Aug 28, 2023
@baijumeswani baijumeswani dismissed stale reviews from hariharans29 and Lafi7e via 17a58d8 August 28, 2023 17:21
@baijumeswani baijumeswani merged commit 5d2c573 into main Aug 29, 2023
@baijumeswani baijumeswani deleted the baijumeswani/abs-grad branch August 29, 2023 04:03
@baijumeswani
Copy link
Contributor Author

Thank you for the review @er3x3 @hariharans29

snnn pushed a commit that referenced this pull request Nov 2, 2023
Cherry-pick PRs: 
#18026 
#17912 
#17901 “2 lines added whitespace errors when cherry-picking"
#17293 
#17364 
#17505 
#17885

This PR contains all the cherry-picks for the patch release except:
1. The PRs marked with sdxl_llama
2. #17772 which has a merge conflict.

---------

Co-authored-by: Chi Lo <[email protected]>
Co-authored-by: Chi Lo <[email protected]>
Co-authored-by: Scott McKay <[email protected]>
Co-authored-by: Baiju Meswani <[email protected]>
Co-authored-by: Kaz Nishimura <[email protected]>
Co-authored-by: Scott McKay <[email protected]>
kleiti pushed a commit to kleiti/onnxruntime that referenced this pull request Mar 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

training issues related to ONNX Runtime training; typically submitted using template

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants