Open this lesson in your favourite AI. It'll walk you through the why, explain the demo, and quiz you on the try-it list.
A single neuron is a dot product followed by a non-linearity. A linear layer is many dot products bundled as a matrix multiply. Attention is a dot product between queries and keys. Word similarity is a dot product between embeddings. If you understand the dot product — the weighted sum — and you understand that matrix multiplication is just many dot products stacked, you've unlocked 90% of the math in an LLM.
The dot product of two vectors a and b is sum(a_i * b_i). Geometrically it's |a| * |b| * cos(angle) — which is why normalized dot products (cosine similarity) measure "how aligned are these two vectors?" That's how "king - man + woman ≈ queen" works in embedding space.
Below: the dot product from scratch in three languages, plus cosine similarity on a tiny embedding example.
queen and observe what happens to cosine similarity. Can you make it negative?king's 3rd coordinate from 0.1 to 0.95 (making it more car-like). Re-run both cosines — they should cross over.Use these three in order. Each builds on the one before.
Define the dot product and cosine similarity. Give one non-ML example and one ML example where each is used.
Explain why `a · b = |a||b|cos(θ)`. Walk through the derivation using the law of cosines, then explain why dividing by magnitudes gives a scale-invariant similarity.
In an LLM, when is a dot product NOT the right similarity? Name two: (a) when embeddings aren't normalized and lengths encode frequency, (b) in IR with learned-sparse vectors. Explain what to use instead.
# main.py — dot product and cosine similarity
import numpy as np
def dot(a, b):
return sum(x*y for x, y in zip(a, b))
def cosine(a, b):
import math
d = dot(a, b)
na = math.sqrt(dot(a, a))
nb = math.sqrt(dot(b, b))
return d / (na * nb)
king = [0.9, 0.8, 0.1, 0.0]
queen = [0.85, 0.85, 0.15, 0.9] # pretend-embeddings
car = [0.1, 0.0, 0.95, 0.1]
print("dot(king, queen):", dot(king, queen))
print("cos(king, queen):", cosine(king, queen)) # high
print("cos(king, car):", cosine(king, car)) # lowpython3 main.py