Open this lesson in your favourite AI. It'll walk you through the why, explain the demo, and quiz you on the try-it list.
An embedding is a dense vector representation of discrete data (a word, sentence, image, or code snippet) that encodes semantic meaning in its geometry: similar things are close together in vector space. This is why you can add vectors to get 'king - man + woman ≈ queen', why a semantic search over 10M documents takes milliseconds (nearest-neighbor over vectors), and why RAG (retrieval-augmented generation) works at all. Embeddings are the foundation of modern AI infrastructure — search, recommendation, anomaly detection, and every RAG pipeline depend on them.
A sentence embedding encodes semantic meaning as a point in high-dimensional space, so cosine similarity between two vectors directly measures how related the sentences are — even with no shared words. The demo encodes a small corpus with all-MiniLM-L6-v2 and ranks documents against a query, revealing exactly how semantic search differs from keyword matching.
'What is supervised learning?' against the corpus. Check that sentences about gradient descent and backprop rank higher than the one about dogs. This is semantic search — it works even without keyword overlap.'Stochastic gradient descent updates weights using a mini-batch of samples.'. Re-run the query about neural network learning. Does it rank above backpropagation? Why or why not?corpus_emb_n @ corpus_emb_n.T. Print it. Verify that the two ML-related sentences about gradient descent and backprop are more similar to each other than to the dogs sentence.'Animals as pets'. Verify the dogs sentence now ranks highest. This demonstrates that embedding space is query-sensitive — the same document scores differently for different queries.Use these three in order. Each builds on the one before.
In one paragraph, explain what a sentence embedding is and why two semantically similar sentences have similar embedding vectors even if they share no words.
Walk me through how a sentence-transformer model converts a variable-length sentence into a fixed-size vector. What is mean pooling, and why is it applied to the token embeddings? Why is cosine similarity preferred over Euclidean distance for comparing embeddings?
I need to build a semantic search system over 5M product descriptions. Walk me through the full stack: embedding model choice (all-MiniLM vs BGE vs OpenAI text-embedding-3-small), vector database options (Pinecone vs Weaviate vs pgvector), approximate nearest neighbor algorithms (HNSW vs IVF), and batched indexing strategy for 5M items. What's the approximate storage size and p95 query latency I should expect?
# pip install sentence-transformers
from sentence_transformers import SentenceTransformer
import numpy as np
model = SentenceTransformer("all-MiniLM-L6-v2") # 80MB, fast and good
corpus = [
"Python is a general-purpose programming language.",
"Gradient descent minimizes the loss function iteratively.",
"The attention mechanism lets every token attend to every other token.",
"Dogs are loyal and friendly animals.",
"Backpropagation computes gradients through the chain rule.",
]
query = "How does a neural network learn?"
corpus_emb = model.encode(corpus) # (5, 384)
query_emb = model.encode([query]) # (1, 384)
# Cosine similarity = dot product when vectors are unit-normalized
corpus_emb_n = corpus_emb / np.linalg.norm(corpus_emb, axis=1, keepdims=True)
query_emb_n = query_emb / np.linalg.norm(query_emb, axis=1, keepdims=True)
scores = (query_emb_n @ corpus_emb_n.T).squeeze()
for i in np.argsort(scores)[::-1]:
print(f"{scores[i]:.3f} {corpus[i]}")python3 main.py