Capstok — learn by doing

Why this matters

Software, left untouched, mostly keeps working. ML systems left untouched silently decay — and understanding the mechanisms is what turns monitoring from a nice-to-have into a survival requirement. The world the model was trained on drifts away from the world it now serves: user behavior shifts, an upstream data source changes its format, a category that was rare becomes common. The model's weights are frozen but reality is not, so accuracy erodes without any alert firing. For LLM apps the same rot hits differently — a provider deprecates the model you pinned, a prompt that worked degrades as the underlying model updates, your retrieval corpus goes stale. Naming these decay modes is the first step to detecting them.

Demo

The demo simulates concept drift: a model trained on one input distribution is scored against a slowly shifting one, and you watch accuracy decay over time even though the model never changed — the quiet failure MLOps is built to catch.

Try it yourself

Run it and watch accuracy fall as the input mean drifts — nothing in the model changed.
Increase the drift rate (0.04 → 0.08) and see decay accelerate.
List three real drift sources for your own app (user behavior, upstream schema, provider model update).
For an LLM app specifically, name the rot mode that has no classic-ML analog (a pinned model gets deprecated).

Prompt your AI

Use these three in order. Each builds on the one before.

1. Basics & terminology

In one paragraph, explain why machine learning models get worse over time even when nobody changes them.

2. Why it works (the mechanism)

Walk me through the difference between data drift, concept drift, and upstream schema changes, and how each degrades a deployed model.

3. Advanced — application & what's next

Given an LLM-powered feature in production, explain the specific ways it can rot that a classic ML model wouldn't, and how I'd detect each.

References

Chat about this lesson

import numpy as np
rng = np.random.default_rng(0)

# A toy classifier learned the boundary x > 0.5 from TRAINING data centered at 0.5
def model(x): return (x > 0.5).astype(int)

# But the live input distribution slowly DRIFTS upward over 12 weeks.
def true_label(x): return (x > 0.5).astype(int)  # ground truth boundary is stable...
for week in range(0, 12, 3):
    mean = 0.5 + 0.04 * week            # ...but inputs drift, so the data the model SEES changes
    x = rng.normal(mean, 0.15, 5000).clip(0, 1)
    acc = (model(x) == true_label(x)).mean()
    print(f"week {week:>2}: input mean={mean:.2f}  accuracy={acc:.3f}")
# Accuracy near the boundary degrades as the population shifts — no code or weights changed.

Run: python3 main.py

Why ML systems rot