Skip to content

Implementation of Transformers from scratch given by the paper "Attention is all you Need" in Tensorflow/Keras.

Notifications You must be signed in to change notification settings

aju22/Transformers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Transformers

In sequence-to-sequence problems such as the neural machine translation, the initial proposals were based on the use of RNNs in an encoder-decoder architecture. These architectures have a great limitation when working with long sequences, their ability to retain information from the first elements was lost when new elements were incorporated into the sequence.

Then to deal with this limitation, a new concept were introduced the attention mechanism.

Attention

The Transformer model extract features for each word using a self-attention mechanism to figure out how important all the other words in the sentence are w.r.t. to the aforementioned word. And no recurrent units are used to obtain this features, they are just weighted sums and activations, so they can be very parallelizable and efficient.

Research Paper : Attention Is All You Need

This is an implementation of Transformer Attention model using Tensorflow/Keras.

About

Implementation of Transformers from scratch given by the paper "Attention is all you Need" in Tensorflow/Keras.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published