Add GLU variants #47

jaketae · 2021-08-04T19:55:16Z

This PR addresses #44. Specifically, it implements the following activation functions:

LiGLU
GEGLU
ReGLU
SwiGLU

Nomenclature-wise, Bilinear seems to be synonymous with LiGLU.

jaketae · 2021-08-04T20:00:26Z

@stas00 I see that the original Megatron codebase has fused activation functions that use JIT. Should we also do this for experimental activation functions?

stas00 · 2021-08-04T20:31:44Z

Yes, of course! If it works that is.

jaketae · 2021-08-05T06:43:50Z

I think JIT doesn't like None as the default argument for bias. My understanding is that bias terms will be added in previous layers before the activation function, so I don't see why we would need it (I was following GPT-Neo's codebase as reference). I'll get rid of the bias argument and see if I can get JIT working.

jaketae · 2021-08-05T13:46:25Z

@stas00 Something funky is happening on my local, and I'm getting an error with reglu (the rest work fine). Trace:

RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
RuntimeError: vector

I ran the code on Colab and verified that they all work fine. Should we be writing unit tests of any sort to ensure that modules work as intended without error?

stas00 · 2021-08-05T17:48:59Z

This looks like a truncated traceback, no? From googling this error typically there is vector::something

How do I reproduce it?

And yes, we absolutely need to start writing tests, as Meg didn't seem to have any. So if you're inspired start adding the tests under tests and overtime we will expand it and also need to setup up a CI.

jaketae · 2021-08-05T18:33:04Z

Surprisingly, that's all the trace shows me. Below is what I did to cause the error on my local (macOS, Python 3.7.3).

>>> from megatron.model.activations import liglu, geglu, reglu, swiglu
>>> import torch
>>> x = torch.randn(8, 100, 768)
>>> reglu(x)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/jaketae/Documents/Dev/GitHub/Megatron-DeepSpeed/venv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
RuntimeError: vector

>>> torch.__version__
'1.9.0'

Two more observations:

I was not able to reproduce the error on Colab
When I moved activations.py to an entirely separate directly and ran the same sequence on the same virtual environment, the error did not occur

We will need testing to iron out messy details like this, and I could certainly try writing some very basic ones for the next few days, but unless you can reproduce the error on your local I wouldn't be too concerned... or should we?

stas00 · 2021-08-05T18:44:13Z

I get the full trace on my machine:

python -c "from megatron.model.activations import liglu, geglu, reglu, swiglu; import torch; print(torch.__version__); \
reglu(torch.randn(8, 100, 768))"
1.9.0
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/stas/anaconda3/envs/py38-pt19/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
RuntimeError: vector::_M_range_check: __n (which is 18446744073709551615) >= this->size() (which is 3)

the other 3 produce no error.

megatron/model/activations.py

jaketae · 2021-08-05T18:52:12Z

Apparently JIT doesn't like negative indexing (source). I replaced dim=-1 with dim=2 and it works fine now (it also passed a mini unit test). Can you confirm? Thanks!

stas00 · 2021-08-05T18:52:28Z

yeah, that fixed it. Interesting that dim=-1 is 18446744073709551615

stas00 · 2021-08-05T18:53:06Z

Is it always a dim 3 vector? It will fail with dim 4 vector for example

jaketae · 2021-08-05T18:53:15Z

It's odd that the error is still there, my source is from 2019. Is it safe to assume that all tensors will be three-dimensional?

EDIT Haha we had the same question. A remedy would be to use x.ndim. If we can safely assume that the last dimension will always be the feature dimension, I think we can do x.ndim - 1 or something of that sort. Does that sound good?

stas00 · 2021-08-05T18:55:36Z

yes, I just tested x1, x2 = x.chunk(2, dim=(x.ndim-1)) as proposed in the thread you linked to and it works. Sounds like the most generic approach.

megatron/model/activations.py

stas00 · 2021-08-05T18:57:17Z

supposedly this bug has been fixed recently in pytorch/pytorch#25135 (comment), so probably pt-1.10.0, but we need to support older pytorch, so let's leave a comment there.

# dim=-1 breaks in pt<1.10
x1, x2 = x.chunk(2, dim=(x.ndim-1))

jaketae · 2021-08-05T19:00:31Z

Sounds good! Thanks for testing the code on your end.

megatron/model/activations.py

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

stas00

Let's add basic tests, and looking great otherwise. Thank you, @jaketae

megatron/model/activations.py

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

jaketae · 2021-08-06T13:47:38Z

@stas00 I wrote some simple tests. Admittedly I don't have a lot of experience writing test code, and I wasn't sure if the way I tested the operations make sense (i.e. the test somewhat regurgitates function definitions). I also took a monkey testing approach and used random inputs, with seeds set at the beginning of the run for reproducibility. Let me know what you think. Thanks!

stas00 · 2021-08-08T01:42:08Z

Looks great, @jaketae

We will gradually improve the test suite, so yours is a good start.

Let's merge this one

jaketae · 2021-08-08T12:50:59Z

Thanks for the feedback and review @stas00!

jaketae added 2 commits August 5, 2021 04:37

feat: add glu variant activations

52df7f2

fix: rm extraneous parentheses

92be11e

feat: rm bias to support jit

589b009

jaketae self-assigned this Aug 5, 2021

jaketae changed the title ~~[WIP] Add GLU variants~~ Add GLU variants Aug 5, 2021

jaketae changed the title ~~Add GLU variants~~ [WIP] Add GLU variants Aug 5, 2021

jaketae commented Aug 5, 2021

View reviewed changes

megatron/model/activations.py Outdated Show resolved Hide resolved

fix: replace negative dim with explicit dim

3af3d78

jaketae commented Aug 5, 2021

View reviewed changes

megatron/model/activations.py Outdated Show resolved Hide resolved

fix: use x.ndim for generic dim handling

f48d318

jaketae changed the title ~~[WIP] Add GLU variants~~ Add GLU variants Aug 5, 2021

stas00 reviewed Aug 5, 2021

View reviewed changes

megatron/model/activations.py Show resolved Hide resolved

docs: add note on version for posterity

36d9aac

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

stas00 approved these changes Aug 5, 2021

View reviewed changes

stas00 reviewed Aug 5, 2021

View reviewed changes

megatron/model/activations.py Outdated Show resolved Hide resolved

docs: specify jit in x.ndim comment

ee92c8d

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

jaketae mentioned this pull request Aug 6, 2021

Add tests & setup CI #53

Closed

jaketae added 3 commits August 6, 2021 22:01

test: add simple tests to check activations

9348418

fix: use torch.testing for tensor checks

58084a3

test: use seed-controlled random batch inputs

7a68c86

stas00 mentioned this pull request Aug 8, 2021

Tests thomasw21/Megatron-DeepSpeed#1

Merged

3 tasks

jaketae merged commit effb2fb into main Aug 8, 2021

jaketae mentioned this pull request Aug 8, 2021

implement GLU activation function #44

Closed

stas00 deleted the activations branch August 8, 2021 16:21

jaketae mentioned this pull request Aug 18, 2021

Expose GLU activations as arguments #69

Merged

stas00 mentioned this pull request Aug 21, 2021

[testing] fixes for pt-1.10 #71

Merged

Add GLU variants #47

Add GLU variants #47

Uh oh!

Conversation

jaketae commented Aug 4, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jaketae commented Aug 4, 2021

Uh oh!

stas00 commented Aug 4, 2021

Uh oh!

jaketae commented Aug 5, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jaketae commented Aug 5, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stas00 commented Aug 5, 2021

Uh oh!

jaketae commented Aug 5, 2021

Uh oh!

stas00 commented Aug 5, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

jaketae commented Aug 5, 2021

Uh oh!

stas00 commented Aug 5, 2021

Uh oh!

stas00 commented Aug 5, 2021

Uh oh!

jaketae commented Aug 5, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stas00 commented Aug 5, 2021

Uh oh!

Uh oh!

stas00 commented Aug 5, 2021

Uh oh!

jaketae commented Aug 5, 2021

Uh oh!

Uh oh!

stas00 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jaketae commented Aug 6, 2021

Uh oh!

stas00 commented Aug 8, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jaketae commented Aug 8, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jaketae commented Aug 4, 2021 •

edited

Loading

jaketae commented Aug 5, 2021 •

edited

Loading

jaketae commented Aug 5, 2021 •

edited

Loading

stas00 commented Aug 5, 2021 •

edited

Loading

jaketae commented Aug 5, 2021 •

edited

Loading

stas00 commented Aug 8, 2021 •

edited

Loading