Counting
A bigram language model trained by counting — the simplest possible language model.
- 0.1 The Dataset
- 0.2 The Tokenizer
- 0.3 The Count Table
- 0.4 The Model
- 0.5 Training
- 0.6 Loss
- 0.7 Inference
Key insight: Counting IS learning. For this simple model, it's the closed-form solution. Gradient descent is what you need when the model is too expressive for exact solutions.