What is this?
Andrej Karpathy wrote a single Python file that builds up a GPT transformer from scratch — no libraries, no frameworks, just pure Python and basic maths. He did it in six incremental steps, each adding one “layer of the onion.”
This tutorial goes one step further: we break each of Karpathy’s steps into substeps, and explain each one with diagrams, animations, and interactive visualizations. By the end, you’ll understand every line — not just what it does, but why.
The Six Steps
Gradient Descent
A single-layer MLP bigram model, trained by gradient descent.
Coming soonAutograd
The same MLP, but now trained with automatic differentiation.
Coming soonAttention
Single-head attention with position embeddings.
Coming soonTransformer
A full GPT transformer, trained with SGD.
Coming soonAdam
The full GPT transformer, trained with Adam. This is the final model.
Coming soon