The Count Table
So far
vocab_size— 27
The “model” is a 2D table. That’s it. No neural network, no weights — just a grid of counts.
Each cell state_dict[i][j] will record how many times token jcol has followed token irow in the training data.
state_dict = [[0] * vocab_size for _ in range(vocab_size)]
That’s 27 rows and 27 columns — one for each token in our vocabulary — giving us 729 cells. Every cell starts at zero.
The rows represent the current token. The columns represent the next token. So state_dict[0][21] will eventually hold the count of how many times v21 followed a0 in the training data.
Here’s the full table — 27 rows, 27 columns, all zeros:
This is the entire model. There’s nothing else to initialize — no random weights, no architecture decisions. Just an empty table, waiting to be filled by training.