[GELU] Add f32/x4, f16/x2/x8/x8pack kernel. #66

bear-zd · 2024-10-10T11:27:35Z

Saw the mention of GELU in the issue, so I worked on it. There is no GELU implementation in torch for half precision (the reason is explained in readme.md), so I implemented some corresponding approximation algorithms.

DefTruth · 2024-10-10T13:02:31Z

LGTM

DefTruth

感谢贡献~ 我format了一下，性能结果用我手上的机器重跑后更新了。

[GELU] Add f32/x4, f16/x2/x8/x8pack kernel.

7546fae

DefTruth added 4 commits October 11, 2024 08:58

Update README.md

043be2d

Update gelu.cu

6867836

Update gelu.py

1fc33a6

Update README.md

f24aab8

DefTruth approved these changes Oct 11, 2024

View reviewed changes

DefTruth merged commit 1eae888 into DefTruth:main Oct 11, 2024

DefTruth mentioned this pull request Oct 11, 2024

🌤🌤 CONTRIBUTE 🎉🎉 #50

Open

43 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GELU] Add f32/x4, f16/x2/x8/x8pack kernel. #66

[GELU] Add f32/x4, f16/x2/x8/x8pack kernel. #66

bear-zd commented Oct 10, 2024

DefTruth commented Oct 10, 2024

DefTruth left a comment

[GELU] Add f32/x4, f16/x2/x8/x8pack kernel. #66

[GELU] Add f32/x4, f16/x2/x8/x8pack kernel. #66

Conversation

bear-zd commented Oct 10, 2024

DefTruth commented Oct 10, 2024

DefTruth left a comment

Choose a reason for hiding this comment