Implement a minimal scalar-valued reverse-mode autograd engine (the 100-LOC Karpathy-style micrograd), then use it to fit y = 2x + 3 on noisy data by running gradient descent for 200 steps. Do it in Python, Go, OR Rust — not all three. Submit code plus a written paragraph answering: what would break if you processed your topological sort forward instead of in reverse?
a.grad += b.data * out.grad. Write this out on paper first.step 0 loss=18.2411 w=0.182 b=0.058
step 40 loss=0.1438 w=1.902 b=2.804
step 80 loss=0.0938 w=1.984 b=2.962
step 120 loss=0.0911 w=1.995 b=2.988
step 160 loss=0.0910 w=1.996 b=2.991
final: w=1.996, b=2.991 (truth: w=2, b=3)
tanh op with its backward 1 - tanh^2(x). Use it to fit a non-linear function like y = tanh(x).x**n with backward n * x**(n-1) * out.grad.