MicroGPT Visualized

Building a GPT from scratch — an interactive visual guide

What is this?

Andrej Karpathy wrote a single Python file that builds up a GPT transformer from scratch — no libraries, no frameworks, just pure Python and basic maths. He did it in six incremental steps, each adding one “layer of the onion.”

This tutorial goes one step further: we break each of Karpathy’s steps into substeps, and explain each one with diagrams, animations, and interactive visualizations. By the end, you’ll understand every line — not just what it does, but why.

The Six Steps

Step 1

Gradient Descent

A single-layer MLP bigram model, trained by gradient descent.

Coming soon
Step 2

Autograd

The same MLP, but now trained with automatic differentiation.

Coming soon
Step 3

Attention

Single-head attention with position embeddings.

Coming soon
Step 4

Transformer

A full GPT transformer, trained with SGD.

Coming soon
Step 5

Adam

The full GPT transformer, trained with Adam. This is the final model.

Coming soon