Implementation of causal self-attention that closely follows 3Blue1Brown's "Attention in transformers, visually explained | Chapter 6, Deep Learning" video.
attention.ipynb
contains the code sections that closely corresponds to what 3B1B is talking about in the video in chronological order (implements causal self-attention).
attention.py
has the code bundled into a nice class ready for use (does not have causal attention).
import attention
d_in, d_out_kq, d_out_v = 120, 12, 120
s = attention.SelfAttention(d_in, d_out_kq, d_out_v)
s(x)
Output shape: number of tokens, token embedding dimension
import attention
d_in, d_out_kq, d_out_v, num_heads = 120, 12, 120, 4
m = attention.MultiHeadAttention(d_in, d_out_kq, d_out_v, num_heads)
m(x)
Output shape: num_heads, number of tokens, token embedding dimension
- Implement causal self-attention in
attention.py
.