Open this lesson in your favourite AI. It'll walk you through the why, explain the demo, and quiz you on the try-it list.
Most of the aesthetics you want to reach — Wes Anderson symmetry, 90s anime, Blade Runner neon, Ghibli pastels — are already patterns the model learned. The trick is calling them out clearly and stacking enough signal that the model doesn't hedge. One style word is weak. Three carefully chosen words is strong. Five is usually too many.
Style signals work by narrowing the probability distribution the model samples from — each additional term rules out aesthetics that don't fit. A single director's name is a weak signal because models hedge; three to five specific visual signals (composition style, grain type, color era) create enough overlap in training data to lock in the look. The four levels below demonstrate the progressive narrowing.
L1 (weak): A quiet street at dusk.
L2 (named): A quiet street at dusk, Wes Anderson style.
L3 (stacked): A quiet street at dusk — Wes Anderson composition, symmetrical framing, pastel palette, slight film grain, dead-center subject.
L4 (anchored): … plus: 35mm, Kodak Portra 400 color, hint of vignetting.Use these three in order. Each builds on the one before.
Explain the difference between naming a style ('Wes Anderson') and describing a style by signals ('symmetry, pastel palette, dead-center subject'). When is each better?
Video models match your prompt against patterns in their training set. Why does stacking 4–5 style signals usually work better than naming one director, but stacking 10 signals often collapses? What's happening at each stage?
Give me a workflow for capturing the look of a film I care about — watch, extract signals, write the prompt. Then have me validate my signal list by regenerating and comparing to a reference frame.