MicroGPT Visualized

Building a GPT from scratch — an interactive visual guide

Step 2

Autograd

The same multilayer perceptron (MLP), but now trained with automatic differentiation — no more hand-written backward pass.

  1. 2.1 What Changes
  2. 2.2 The Value Class: Wrapping Numbers
  3. 2.3 Recording Operations: Add and Multiply
  4. 2.4 More Operations and Syntactic Sugar
  5. 2.5 Backward: Walking the Graph
  6. 2.6 Parameters as Values
  7. 2.7 The New Training Loop
  8. 2.8 Same Results, Less Code
The big idea: Every operation records how it was computed. The chain rule can then be applied automatically by walking backward through the computation graph.
← Step 1: Gradient Descent Step 3: Attention →