Name		Name	Last commit message	Last commit date
parent directory ..
docs		docs
Manifest.toml		Manifest.toml
Project.toml		Project.toml
README.md		README.md
gpt.jl		gpt.jl

README.md

Generative pre-trained transformer

Source

Model Information

GPT is built of a multi-head attention architecture. We offer here a very small instance based on Andrej Karpathy's nanoGPT. The default parameters give a model much smaller than nanoGPT, tuned for fastest convergence on a very small data set (Shakespeare).

This model takes as input a sequence of existing text (context) and produces as output the predicted next character. Actually, it produces the predicted next character for each initial sub-sequence of the input, in effect giving an extra degree of parallelism for the purposes of training.

For the attention mechanism, we use Flux.MultiHeadAttention.

Training

cd text/gpt
julia --project gpt.jl

Example output

After one epoch:

generate(model, "_", 50) = "_me, but plept fairs, And heards, verchean my word"
generate(model, "_", 50) = "_ows know yought, This alce! totether him. weliest"
generate(model, "The", 50) = "These prurd passtion?  CINCESSIT: He eloucy I must"
generate(model, "The", 50) = "The bitherse dresic in to so shall with a his the "

After 20 epochs:

generate(model, "_", 50) = "_ething a calling do me diseases Of, on he's to th"
generate(model, "_", 50) = "_ ragg Thou flatters all in wators the selfsarut o"
generate(model, "The", 50) = "The Mirtouggake Go: For my mischance lords his sea"
generate(model, "The", 50) = "The oll-gakemoremo his dead: All this man make gen"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nanogpt

nanogpt

README.md

Generative pre-trained transformer

Model Information

Training

Example output

References

Files

nanogpt

Directory actions

More options

Directory actions

More options

Latest commit

History

nanogpt

Folders and files

parent directory

README.md

Generative pre-trained transformer

Model Information

Training

Example output

References