Open this lesson in your favourite AI. It'll walk you through the why, explain the demo, and quiz you on the try-it list.
Software, left untouched, mostly keeps working. ML systems left untouched silently decay — and understanding the mechanisms is what turns monitoring from a nice-to-have into a survival requirement. The world the model was trained on drifts away from the world it now serves: user behavior shifts, an upstream data source changes its format, a category that was rare becomes common. The model's weights are frozen but reality is not, so accuracy erodes without any alert firing. For LLM apps the same rot hits differently — a provider deprecates the model you pinned, a prompt that worked degrades as the underlying model updates, your retrieval corpus goes stale. Naming these decay modes is the first step to detecting them.
The demo simulates concept drift: a model trained on one input distribution is scored against a slowly shifting one, and you watch accuracy decay over time even though the model never changed — the quiet failure MLOps is built to catch.
Use these three in order. Each builds on the one before.
In one paragraph, explain why machine learning models get worse over time even when nobody changes them.
Walk me through the difference between data drift, concept drift, and upstream schema changes, and how each degrades a deployed model.
Given an LLM-powered feature in production, explain the specific ways it can rot that a classic ML model wouldn't, and how I'd detect each.
import numpy as np
rng = np.random.default_rng(0)
# A toy classifier learned the boundary x > 0.5 from TRAINING data centered at 0.5
def model(x): return (x > 0.5).astype(int)
# But the live input distribution slowly DRIFTS upward over 12 weeks.
def true_label(x): return (x > 0.5).astype(int) # ground truth boundary is stable...
for week in range(0, 12, 3):
mean = 0.5 + 0.04 * week # ...but inputs drift, so the data the model SEES changes
x = rng.normal(mean, 0.15, 5000).clip(0, 1)
acc = (model(x) == true_label(x)).mean()
print(f"week {week:>2}: input mean={mean:.2f} accuracy={acc:.3f}")
# Accuracy near the boundary degrades as the population shifts — no code or weights changed.python3 main.py