Skip to content

Latest commit

 

History

History
19 lines (13 loc) · 635 Bytes

README.md

File metadata and controls

19 lines (13 loc) · 635 Bytes

CIFAR-10 Vision Transformer

The notebook contains

  • Building a vision transformer from scratch
  • Training a pre-trained ViT
  • Comparing ViTs and CNNs

Vision Transformer

In the first part, we trained the model with 20 epochs and 4 attention heads and 4 layers. Also embedding dimension has been set to 64.

Pre-trained Vision Transformer

There is also a built-in version of ViT model which was trained on ImageNet-21k at resolution 224 * 224. Here we try this with 3 epochs model.

Pre-trained CNN

ResNet18 with 3 epochs has been trained on the dataset.

Results

pic