What a model actually is — parameters, inputs, outputs

easy

Learn with your AI

Open this lesson in your favourite AI. It'll walk you through the why, explain the demo, and quiz you on the try-it list.

Open in Claude Open in ChatGPT

Why this matters

Most ML tutorials start with model.fit(X, y) and never explain what's inside. A model is a function with learnable parameters: given input features it multiplies them by weights, adds a bias, and passes the result through a non-linearity. Understanding this — not as magic but as f(x; θ) — means you can read any architecture diagram, debug shape mismatches, and understand why 'training' is just parameter optimization. Skip this and you'll cargo-cult .fit() calls without knowing what they're actually doing.

Demo

A machine learning model is just a function with tunable numbers: for a single neuron those numbers are one weight vector and one bias. Writing it by hand in NumPy — before any framework gets involved — makes the computation concrete: a dot product, an addition, and a non-linearity. Every architecture from logistic regression to GPT is a variation on this same skeleton.

Try it yourself

Set w = np.zeros(3). What does y_hat become? This is the uninitialised-model baseline — it predicts 0.5 regardless of input.
Set x = np.zeros(3). Show that no matter what w is, y_hat depends only on b. This reveals the role of the bias term.
Swap sigmoid for a linear pass-through (lambda z: z). Feed x = np.array([1, 0, 0]) and verify y_hat == w[0] + b. This is linear regression in its simplest form.
Print w.shape, x.shape, and np.dot(w, x). Confirm the shapes are compatible — this is the first shape-matching constraint you'll hit in every real model.

Prompt your AI

Use these three in order. Each builds on the one before.

1. Basics & terminology

In one paragraph, explain what a machine learning model's parameters are. Where do they come from — are they set by the programmer, or found automatically? Use the single-neuron example as a concrete reference.

2. Why it works (the mechanism)

Walk me through what happens numerically when the neuron receives `x = [2, 1, 0.5]`, `w = [0.5, -1.2, 0.8]`, `b = 0.3`: show each step — dot product, add bias, sigmoid. Why use sigmoid instead of outputting z directly?

3. Advanced — application & what's next

Distinguish a model's architecture (layer count, activation shapes) from its weights (the learned numbers). Why does this distinction matter when saving a checkpoint, fine-tuning a pretrained model, or debugging a NaN loss? Give one concrete example for each scenario.