MicroGPT Visualized

Building a GPT from scratch — an interactive visual guide

← 0.2 The Tokenizer 0.4 The Model →
Step 0: Counting › 0.3

The Count Table

So far

  • vocab_size — 27

The “model” is a 2D table. That’s it. No neural network, no weights — just a grid of counts.

Each cell state_dict[i][j] will record how many times token jcol has followed token irow in the training data.

state_dict = [[0] * vocab_size for _ in range(vocab_size)]

That’s 27 rows and 27 columns — one for each token in our vocabulary — giving us 729 cells. Every cell starts at zero.

The rows represent the current token. The columns represent the next token. So state_dict[0][21] will eventually hold the count of how many times v21 followed a0 in the training data.

Here’s the full table — 27 rows, 27 columns, all zeros:

This is the entire model. There’s nothing else to initialize — no random weights, no architecture decisions. Just an empty table, waiting to be filled by training.

← 0.2 The Tokenizer 0.4 The Model →