Step 0

Counting

A bigram language model trained by counting — the simplest possible language model.

0.1 The Dataset
0.2 The Tokenizer
0.3 The Count Table
0.4 The Model
0.5 Training
0.6 Loss
0.7 Inference

Key insight: Counting IS learning. For this simple model, it's the closed-form solution. Gradient descent is what you need when the model is too expressive for exact solutions.