MicroGPT Visualized

Building a GPT from scratch — an interactive visual guide

Step 0

Counting

A bigram language model trained by counting — the simplest possible language model.

  1. 0.1 The Dataset
  2. 0.2 The Tokenizer
  3. 0.3 The Count Table
  4. 0.4 The Model
  5. 0.5 Training
  6. 0.6 Loss
  7. 0.7 Inference
Key insight: Counting IS learning. For this simple model, it's the closed-form solution. Gradient descent is what you need when the model is too expressive for exact solutions.