MicroGPT Visualized

Building a GPT from scratch — an interactive visual guide

Step 0

Counting

A bigram language model trained by counting — the simplest possible language model.

  1. 0.1 The Dataset
  2. 0.2 The Tokenizer
  3. 0.3 The Count Table
  4. 0.4 The Model
  5. 0.5 Training
  6. 0.6 Loss
  7. 0.7 Inference
The big idea: Counting IS learning. For this simple model, counting the data gives us the exact answer.
Step 1: Gradient Descent →