ConvTranspose layers, MultiheadAttention, lookup embeddings, LayerNorm by andyehrenberg · Pull Request #34 · patrick-kidger/equinox

andyehrenberg · 2022-02-25T03:41:56Z

Let me know if these should be put into separate PRs!

For ConvTranspose layers, I'm using the PyTorch style argument for output_padding instead of haiku's output_size - haiku has the limitation that padding can only be either 'same' or 'valid', though other options seem to be much less frequently useful. My current implementation also has this limitation - I haven't thought deeply enough about the best way to deal with any combination of padding and output_padding. Probably needs some fixing.
The other 3 modules feel like pretty typical, barebones implementations
When playing around with trying to reimplement grokking using equinox/these new layers, I found that trying to initialize an optax optimizer with an eqx.nn.MLP via optim.init(model) throws a TypeError about optax.transform.init_fn requiring all arguments to jnp.zeros_like to be arrays or scalars. My workaround is just using params, static = eqx.partition(model, eqx.is_inexact_array) and then opt_state = optim.init(params) - is this sort of thing unavoidable? Most other modules aren't suffering from this, so following along with your demo usually works.

…main

patrick-kidger · 2022-02-25T13:50:52Z

Okay, this is amazing! I'll go through and add comments in-line in a bit, but glancing over now this looks very well done.
(I'm not too fussed about making these separate PRs.)

CC @lucidrains -- I recall you saying on Reddit that you were thinking of implementing some of these, so please feel free to express an opinion / leave review comments if you want.

On initialising optimisers: yep, this is something I'm aware of, c.f. #15. I've been meaning to document it. I think what you're doing is the correct way to handle it.

patrick-kidger

Right, done making comments! There's quite a lot but that's only because this is such a big PR / and they're all pretty small/nitty stuff anyway. This is a great PR.

I also haven't thought too hard about padding for transposed convolutional layers. I think having just same/valid padding is probably fine for now, as long as you think it'll be easy to adjust without introducing any backward compatibility concerns, if need be?

By the way, do you want to (a) bump the version number, and (b) add these new layers to the documentation here?

patrick-kidger · 2022-02-25T17:03:55Z

equinox/nn/attention.py

+class MultiheadAttention(Module):
+    """
+    Multihead Attention layer from 'Attention Is All You Need' (https://arxiv.org/abs/1706.03762)
+    """


Would it be possible to add the mathematical formulation here?

PyTorch kind-of do this for attention here, but I'm imagining something more precise, a la here.

The documentation generation supports LaTeX: stuff $\alpha$ morestuff.

Added documentation in the latest commit - though I had to skip using flake8 because it's complaining about the '\i' and '\s' in '\intercal' and '\sqrt'. How can this be ignored by the checks in this repo?

This is because Python/flake8 is trying to interpret those as escape codes, like \n for a newline. Parsing of escape codes can be disabled by prefixing an r before the string, i.e.: r""" ... stuff ... """

equinox/nn/attention.py

equinox/nn/conv.py

equinox/nn/normalization.py

patrick-kidger · 2022-02-25T17:28:51Z

equinox/nn/normalization.py

+
+
+class LayerNorm(Module):
+    """Layer Normalization as described in https://arxiv.org/abs/1607.06450"""


I think it'd be great to have the precise mathematical description here, c.f. here, including the precise meaning of elementwise_affine. (By all means just copy what they've written verbatim if you do/don't wish.)

(And the meaning of normalized_shape, actually.)

equinox/nn/normalization.py

tests/test_nn.py

patrick-kidger · 2022-03-03T17:58:59Z

@andyehrenberg Just following up on your plans for this? (I'd be happy to help out on this PR if you want.)

andyehrenberg · 2022-03-04T05:01:15Z

@patrick-kidger Starting to work through your suggestions/corrections - got busy earlier this week. Thanks for the feedback!

patrick-kidger · 2022-03-04T12:11:12Z

Excellent! Let me know when this is ready for me to look at again.

andyehrenberg and others added 17 commits February 15, 2022 23:29

conv transpose layers

7a7a9ed

fix output_padding default

4fd42f1

conv transpose tests

d192b3f

attention and embedding layers

c1b1baf

LayerNorm

0f344d5

documentation for attention

5dda0fc

embedding documentation

eeb5e59

fix import error

a8145f3

Merge branch 'patrick-kidger:main' into main

1ce7b42

fix attention

56e29b5

Merge branch 'main' of https://github.com/andyehrenberg/equinox into …

2a599c5

…main

_project convenience fn

6574edf

attention and embedding tests

dc95ebc

normalization test

7c81430

fix tests

f5816ec

tolerance for layernorm test

cf22c67

fix embedding test

40367ea

patrick-kidger reviewed Feb 25, 2022

View reviewed changes

andyehrenberg added 2 commits March 3, 2022 21:45

better docs for attention, kwargs in init, other fixes

1c093a9

fix typos, style for conv

6c407ee

patrick-kidger changed the base branch from main to attn-convt-layernorm March 9, 2022 20:33

patrick-kidger merged commit 1330439 into patrick-kidger:attn-convt-layernorm Mar 9, 2022

patrick-kidger mentioned this pull request Mar 10, 2022

Attention, Transposed Convolutions, Embeddings, LayerNorm #38

Merged



		class LayerNorm(Module):
		"""Layer Normalization as described in https://arxiv.org/abs/1607.06450"""

Uh oh!

Comments

Conversation

andyehrenberg commented Feb 25, 2022

Uh oh!

patrick-kidger commented Feb 25, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

patrick-kidger left a comment

Choose a reason for hiding this comment

Uh oh!

patrick-kidger Feb 25, 2022

Choose a reason for hiding this comment

Uh oh!

andyehrenberg Mar 4, 2022

Choose a reason for hiding this comment

Uh oh!

patrick-kidger Mar 4, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

patrick-kidger Feb 25, 2022

Choose a reason for hiding this comment

Uh oh!

patrick-kidger Feb 25, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

patrick-kidger commented Mar 3, 2022

Uh oh!

andyehrenberg commented Mar 4, 2022

Uh oh!

patrick-kidger commented Mar 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

patrick-kidger commented Feb 25, 2022 •

edited

Loading

patrick-kidger commented Mar 4, 2022 •

edited

Loading