MicroGPT Visualized

Building a GPT from scratch — an interactive visual guide

← 2.2 The Value Class: Wrapping Numbers 2.4 More Operations and Syntactic Sugar →
Step 2: Autograd › 2.3

Recording Operations: Add and Multiply

Previously Defined

  • Value wraps a number with data, grad, _children, _local_grads

Every arithmetic operation on Values returns a new Value that records its parents and local gradients. Here are the two fundamental operations:

Addition

    def __add__(self, other):
        other = other if isinstance(other, Value) else Value(other)
        return Value(self.data + other.data, (self, other), (1, 1))

If c = a + b, then ∂c/∂a = 1 and ∂c/∂b = 1. Both local gradients are 1 — adding just passes the gradient through. Here’s what the resulting Value looks like:

The new Value c remembers that it came from a and b (stored in _children) and that both local gradients are 1 (stored in _local_grads). This is everything backward() will need later.

Multiplication

    def __mul__(self, other):
        other = other if isinstance(other, Value) else Value(other)
        return Value(self.data * other.data, (self, other), (other.data, self.data))

If c = a * b, then ∂c/∂a = b and ∂c/∂b = a. The local gradients are swapped — each parent’s gradient is the other parent’s value:

These two operations — add and multiply — are the foundation. Every other operation is either built on top of them, or follows the same pattern with its own local gradient.

← 2.2 The Value Class: Wrapping Numbers 2.4 More Operations and Syntactic Sugar →