Building the blocks of Segformer architecture.
- Overlap Patch Embedding. A method to convert images to sequence of overlapping patches
- Efficient Self-Attention - 1st Core component of all Transformer based models.
- Mix-FeedForward module - 2nd core component of Transformer models. Along with Self-Attention, forms single Transformer block
- Transformer block - Self-attention + Mix FFN + Layer Norm forms a basic Tranformer block5.
- Decoder head - contains MLP layers.
Here is the result trained on BDD100k drivable area:
Here is the attention maps from the video above: