[Codegen] Port AMDGPU device lib implementations to MLIR rewrites #20213
Labels
codegen
Shared code generation infrastructure and dialects
good first issue 🌱
Good for newcomers
onboarding/codegen
Tasks suitable for new team member onboarding
Overview
For accuracy and/or performance reasons, on AMDGPU backends we prefer device lib implementations of certain math functions (see #19970), for example:
https://github.com/ROCm/llvm-project/tree/amd-staging/amd/device-libs/ocml/src
Erf: https://github.com/ROCm/llvm-project/blob/amd-staging/amd/device-libs/ocml/src/erfF.cl
The erf implementation linked here is quite simple assuming a fast implementation of
exp
, which is a feature of AMD GPUs that makes this implementation fast. The same reasoning can apply to other targets assuming they provide a fastexp
implementation.Task
math.*
ops toscf
+arith
(+ more primitivemath
ops if needed).has_fast_exp
pass option to MathTransformsPass that controls application of the new rewrites.The text was updated successfully, but these errors were encountered: