-
Notifications
You must be signed in to change notification settings - Fork 23
[Fix] Added dbias and dgelu kernels for ROCm #333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
alextmagro
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mostly LGTM, just need to move some functions to rocm_cast_kernels.cuh for licensing. Great work!
819ec5c to
56b9dce
Compare
…entation issues for ifndef
56b9dce to
b893a87
Compare
wenchenvincent
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AllenFarcas There are some core sgpu test failures. Could you make sure that they're fixed?
wenchenvincent
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Description
Please include a brief summary of the changes, relevant motivation and context.
Fixes https://github.com/ROCm/frameworks-internal/issues/13667
Type of change
Changes
getDeviceComputeCapability()which is for NVIDIA platformsfp8_quantize_rocmreusingCastVectorizedUnaryKernelLauncherandCastVectorizedUnaryGradKernelLauncherpartial_reduce_kernelandreduce_dbias_rocmto efficiently reduce large sized inputstest_cast_dbiasandtest_cast_dbias_dgelucpp tests for ROCmChecklist: