← 2.3 Recording Operations: Add and Multiply 2.5 Backward: Walking the Graph →

More Operations and Syntactic Sugar

Previously Defined

Value wraps numbers and records operations
Add and multiply store local gradients

The same pattern extends to every operation our model needs. Each one has a single input and stores a single local gradient:

Power

    def __pow__(self, other): return Value(self.data**other, (self,), (other * self.data**(other-1),))

If c = a ** n, then ∂c/∂a = n * a^(n-1) — the standard power rule. Note that other here is a plain number, not a Value (we only need to differentiate with respect to the base).

Log

    def log(self): return Value(math.log(self.data), (self,), (1/self.data,))

If c = log(a), then ∂c/∂a = 1/a. This is the operation that turns probabilities into the loss (via -log(prob)).

Exp

    def exp(self): return Value(math.exp(self.data), (self,), (math.exp(self.data),))

If c = exp(a), then ∂c/∂a = exp(a) — the exponential is its own derivative. Used inside softmax.

ReLU

    def relu(self): return Value(max(0, self.data), (self,), (float(self.data > 0),))

If c = relu(a), then ∂c/∂a is 1 when a > 0, and 0 otherwise — the same binary gradient we hand-coded in Step 1.

The rest

    def __neg__(self): return self * -1
    def __radd__(self, other): return self + other
    def __sub__(self, other): return self + (-other)
    def __rsub__(self, other): return other + (-self)
    def __rmul__(self, other): return self * other
    def __truediv__(self, other): return self * other**-1
    def __rtruediv__(self, other): return other * self**-1

Subtraction, division, and negation are all defined in terms of add, multiply, and power. They don’t need their own gradient logic — the chain rule handles them automatically through the operations they’re built from.

← 2.3 Recording Operations: Add and Multiply 2.5 Backward: Walking the Graph →