Skip to content

amoramine/FNet_with_BART_classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 

Repository files navigation

FNet_with_BART_classification

Description

FNet proposes to replace Transformers self-attention layers (having O(n^2) computational complexity) with linear transformations that mix input tokens. More specifically, the authors find that replacing the attention mechanism with a standard unparameterized Fourier Transform achieves 92% of the accuracy of BERT on the GLUE benchmark, but pre-trains and runs up to seven times faster on GPUs and twice as fast on TPUs.

This project leverages an existing implementation of FNet's encoder (https://github.com/jaketae/fnet.git) and completes the code to train the model on Stanford Sentiment Treebank (SST) dataset. SST is a dataset aimed for binary classification and is part of GLUE benchmark.

The original FNet model (from the paper) is pre-trained on masked language modelling (MLM) and next sentence prediction (NSP). However, the paper was only published one month ago and the pre-trained model is not available online. In order to harness transfer learning and speed up the model's convergence, this project initialises FNet's encoder with BartForSequenceClassification parameters (loaded from HuggingFace). Additionally, BartTokenizer and BartLearnedPositionalEmbedding are used to process the input sequence before it is fed to the encoder.

In other words, the model can be seen as BART's encoder with a Fourier Transform token mixing mechanism (from FNet) instead of self-attention.

How to run the project

To run the project, clone the repo and run the following commands:

  1. cd FNet_with_BART_classification
  2. pip install -r requirements.txt
  3. python fnet.py

Citation

article{DBLP:journals/corr/abs-2105-03824,
     author = {James Lee-Thorp and Joshua Ainslie and Ilya Eckstein and Santiago Ontañón},
     title = {FNet: Mixing Tokens with Fourier Transforms},
     journal = {CoRR},
     volume = {abs/2105.03824},
     year = {2021},
     url = {https://arxiv.org/abs/2105.03824},
     archivePrefix = {arXiv},
     eprint = {2105.03824},
     timestamp = {Fri, 14 May 2021 12:13:30 +0200},
     biburl = {https://dblp.org/rec/journals/corr/abs-2105-03824.bib},
     bibsource = {dblp computer science bibliography, https://dblp.org}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages