`Vi-T` in `Flax`

A very basic implementtation of Vi-T paper using Flax neural network framework. The main goal of this one is to learn the device-agnostic framework, not get the best results. All results are collected in wandb.sweep using a small custom logger wrapper.

Architecture and some implementation details

Architecture of the model is only suitable for classification tasks

Used Adam optimizer with cosine schedule of rate learning and gradient clipping;
Used MultiHead self-attention with n = 8 heads and hidden dimension of 768;
Implemented learnable and sinusoid positional embeddings but used the former;

Helpful links

https://huggingface.co/flax-community/vit-gpt2/tree/main/vit_gpt2
https://github.com/google/flax/blob/main/examples/imagenet/train.py
Official implementation
Good set of jax tutorials

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
vit		vit
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

`Vi-T` in `Flax`

Architecture and some implementation details

Helpful links

About

Releases

Packages

Languages

alexunderch/flax_vit

Folders and files

Latest commit

History

Repository files navigation

Vi-T in Flax

Architecture and some implementation details

Helpful links

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

`Vi-T` in `Flax`

Packages