Fix `value_dim` in `TransformerDecoder`'s cross-attn layer #667

abheesht17 · 2023-01-16T10:46:48Z

This bug cropped up when I was implementing BartBackbone: #661. Instead of passing value_dim = hidden_dim in the cross-attention layer, we should pass head_dim.

Let's look at the TransformerDecoderBlock layer given in the tensorflow/models repo. value_dim is not passed to keras.layers.MultiHeadAttention layer, which means that value_dim = key_dim = head_dim.

Intuitively, if we pass value_dim as hidden_dim = 768, with num_heads = 12, the weight matrix for value will be of shape (768, 12, 768). This is incorrect. The shape should be (768, 12, 64).

abheesht17 · 2023-01-16T13:06:56Z

Oops, accidentally removed review request for @mattdangerw. Adding it back.

jbischof

Thanks for catching this!

mattdangerw

Fix looks good! I am unclear why we need the testing change though, is it actually changing the test in any way?

mattdangerw · 2023-01-17T19:55:20Z

keras_nlp/layers/transformer_decoder_test.py

-                    intermediate_dim=4, num_heads=2
+                    intermediate_dim=4,
+                    num_heads=2,
+                    has_cross_attention=True,


Hmm, why do we need this actually? Won't this line at the start of call has_encoder_sequence = encoder_sequence is not None, mean the layer will be built with cross attention as soon as the decoder is called on two inputs?

Sorry 🤦🏼 , not needed. Changing it back

mattdangerw

Oops actually marked this as "changes requested" until we figure out the testing bit.

Fix in TransformerDecoder's cross-attn layer

e50c17b

abheesht17 requested review from jbischof and mattdangerw and removed request for mattdangerw January 16, 2023 10:54

jbischof approved these changes Jan 17, 2023

View reviewed changes

mattdangerw approved these changes Jan 17, 2023

View reviewed changes

mattdangerw requested changes Jan 17, 2023

View reviewed changes

Revert UT

9a9449c

abheesht17 requested a review from mattdangerw January 18, 2023 06:04

mattdangerw approved these changes Jan 18, 2023

View reviewed changes

mattdangerw merged commit 8ea419b into keras-team:master Jan 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix `value_dim` in `TransformerDecoder`'s cross-attn layer #667

Fix `value_dim` in `TransformerDecoder`'s cross-attn layer #667

Uh oh!

abheesht17 commented Jan 16, 2023 •

edited

Loading

Uh oh!

abheesht17 commented Jan 16, 2023

Uh oh!

jbischof left a comment

Uh oh!

mattdangerw left a comment

Uh oh!

mattdangerw Jan 17, 2023

Uh oh!

abheesht17 Jan 18, 2023

Uh oh!

mattdangerw left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix value_dim in TransformerDecoder's cross-attn layer #667

Fix value_dim in TransformerDecoder's cross-attn layer #667

Uh oh!

Conversation

abheesht17 commented Jan 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

abheesht17 commented Jan 16, 2023

Uh oh!

jbischof left a comment

Choose a reason for hiding this comment

Uh oh!

mattdangerw left a comment

Choose a reason for hiding this comment

Uh oh!

mattdangerw Jan 17, 2023

Choose a reason for hiding this comment

Uh oh!

abheesht17 Jan 18, 2023

Choose a reason for hiding this comment

Uh oh!

mattdangerw left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix `value_dim` in `TransformerDecoder`'s cross-attn layer #667

Fix `value_dim` in `TransformerDecoder`'s cross-attn layer #667

abheesht17 commented Jan 16, 2023 •

edited

Loading