Vision Transformer Architecture Reimplementation

This project implements the Vision Transformer (ViT) architecture from the paper An Image is Worth 16x16 Words by Alexey Dosovitskiy et al. The Vision Transformer is a transformer-based model that applies the transformer architecture, originally developed for natural language processing tasks, to image recognition tasks.

Key Features

Transformer Architecture: The ViT model uses a standard transformer encoder architecture, treating an image as a sequence of patches and encoding the patches using a transformer encoder.
Image Patch Embedding: The input image is split into fixed-size patches, which are then linearly embedded and serve as the input sequence for the transformer encoder.
Position Embeddings: To retain positional information, learnable position embeddings are added to the patch embeddings.
Pre-training on Large Datasets: The ViT model can be pre-trained on large datasets like ImageNet and then fine-tuned on downstream tasks. (I'm using CIFAR dataset for training)

Installation

To install the necessary dependencies, run:

pip install -r requirements.txt

Training the Model

To train the model, execute the following command:

python train.py

Dataset

This implementation uses the CIFAR10 dataset, a collection of images consist of 60000 32x32 colour images in 10 classes, with 6000 images per class.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
img		img
.gitignore		.gitignore
README.md		README.md
config.py		config.py
dataset.py		dataset.py
model.py		model.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vision Transformer Architecture Reimplementation

Key Features

Installation

Training the Model

Dataset

About

Releases

Packages

Languages

engichang1467/ViT-Reimplementation

Folders and files

Latest commit

History

Repository files navigation

Vision Transformer Architecture Reimplementation

Key Features

Installation

Training the Model

Dataset

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages