- First version of BERT.jl
- Pretraining module and finetuning module for sequence classification
- BertAdam, which is basically a version of Adam that uses weight decay
- Tokenization is implemented
- Ability to load weights from pytorch models
- Provided examples