More Operations and Syntactic Sugar
Previously Defined
Valuewraps numbers and records operations- Add and multiply store local gradients
The same pattern extends to every operation our model needs. Each one has a single input and stores a single local gradient:
Power
def __pow__(self, other): return Value(self.data**other, (self,), (other * self.data**(other-1),))
If c = a ** n, then ∂c/∂a = n * a^(n-1) — the standard power rule. Note that other here is a plain number, not a Value (we only need to differentiate with respect to the base).
Log
def log(self): return Value(math.log(self.data), (self,), (1/self.data,))
If c = log(a), then ∂c/∂a = 1/a. This is the operation that turns probabilities into the loss (via -log(prob)).
Exp
def exp(self): return Value(math.exp(self.data), (self,), (math.exp(self.data),))
If c = exp(a), then ∂c/∂a = exp(a) — the exponential is its own derivative. Used inside softmax.
ReLU
def relu(self): return Value(max(0, self.data), (self,), (float(self.data > 0),))
If c = relu(a), then ∂c/∂a is 1 when a > 0, and 0 otherwise — the same binary gradient we hand-coded in Step 1.
The rest
def __neg__(self): return self * -1
def __radd__(self, other): return self + other
def __sub__(self, other): return self + (-other)
def __rsub__(self, other): return other + (-self)
def __rmul__(self, other): return self * other
def __truediv__(self, other): return self * other**-1
def __rtruediv__(self, other): return other * self**-1
Subtraction, division, and negation are all defined in terms of add, multiply, and power. They don’t need their own gradient logic — the chain rule handles them automatically through the operations they’re built from.