-
Notifications
You must be signed in to change notification settings - Fork 228
Add GLU variants #47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add GLU variants #47
Conversation
|
@stas00 I see that the original Megatron codebase has fused activation functions that use JIT. Should we also do this for experimental activation functions? |
|
Yes, of course! If it works that is. |
|
I think JIT doesn't like |
|
@stas00 Something funky is happening on my local, and I'm getting an error with I ran the code on Colab and verified that they all work fine. Should we be writing unit tests of any sort to ensure that modules work as intended without error? |
|
This looks like a truncated traceback, no? From googling this error typically there is How do I reproduce it? And yes, we absolutely need to start writing tests, as Meg didn't seem to have any. So if you're inspired start adding the tests under |
|
Surprisingly, that's all the trace shows me. Below is what I did to cause the error on my local (macOS, Python 3.7.3). >>> from megatron.model.activations import liglu, geglu, reglu, swiglu
>>> import torch
>>> x = torch.randn(8, 100, 768)
>>> reglu(x)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/jaketae/Documents/Dev/GitHub/Megatron-DeepSpeed/venv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
RuntimeError: vector
>>> torch.__version__
'1.9.0'Two more observations:
We will need testing to iron out messy details like this, and I could certainly try writing some very basic ones for the next few days, but unless you can reproduce the error on your local I wouldn't be too concerned... or should we? |
|
I get the full trace on my machine: the other 3 produce no error. |
|
Apparently JIT doesn't like negative indexing (source). I replaced |
|
yeah, that fixed it. Interesting that dim=-1 is |
|
Is it always a dim 3 vector? It will fail with dim 4 vector for example |
|
It's odd that the error is still there, my source is from 2019. Is it safe to assume that all tensors will be three-dimensional? EDIT Haha we had the same question. A remedy would be to use |
|
yes, I just tested |
|
supposedly this bug has been fixed recently in pytorch/pytorch#25135 (comment), so probably pt-1.10.0, but we need to support older pytorch, so let's leave a comment there. |
|
Sounds good! Thanks for testing the code on your end. |
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
stas00
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add basic tests, and looking great otherwise. Thank you, @jaketae
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
|
@stas00 I wrote some simple tests. Admittedly I don't have a lot of experience writing test code, and I wasn't sure if the way I tested the operations make sense (i.e. the test somewhat regurgitates function definitions). I also took a monkey testing approach and used random inputs, with seeds set at the beginning of the run for reproducibility. Let me know what you think. Thanks! |
|
Looks great, @jaketae We will gradually improve the test suite, so yours is a good start. Let's merge this one |
|
Thanks for the feedback and review @stas00! |
This PR addresses #44. Specifically, it implements the following activation functions:
Nomenclature-wise, Bilinear seems to be synonymous with LiGLU.