Open this lesson in your favourite AI. It'll walk you through the why, explain the demo, and quiz you on the try-it list.
The vocabulary of cinematography — wide shot, close-up, over-the-shoulder, rack focus, dolly-in — is a 120-year-old prompt specification language. AI video models have been trained on decades of labeled footage; they know these terms. Saying 'medium close-up, 35mm lens, shallow depth of field, key light from camera left' gives you three orders of magnitude more control than 'a person talking.' This task drills the vocabulary and its effect on generation quality.
Three axes: shot size (ELS, ES, LS, MLS, MS, MCU, CU, XCU, ECU), angle (eye-level, high, low, Dutch, bird's-eye, POV), and focal length (14mm, 24mm, 35mm, 50mm, 85mm, 135mm). Combining all three precisely gets you the shot you wanted; combining them vaguely gets you a cousin of it. Cinematographers have spent a century associating each combination with emotional tone — leverage that lineage.
# Prompt vocab cheat sheet — drop these into your tool
Shot size (distance):
ELS = extreme long shot (character is tiny, emphasises landscape)
LS = long shot (full body with surroundings)
MLS = medium long shot (knees up)
MS = medium shot (waist up — most common for dialogue)
MCU = medium close-up (chest up)
CU = close-up (head and shoulders)
XCU = extreme close-up (just eyes or mouth)
Angle:
eye-level, high angle (looking down — diminishes subject)
low angle (looking up — heroic)
Dutch angle (tilted — unease)
POV (subject's viewpoint)
Focal length:
14–24mm = wide, distorts, emphasises space
35mm = standard documentary/indie look
50mm = close to human vision — neutral
85mm = portrait, flattering face
135mm = telephoto, compressed background
Movement:
locked off, pan, tilt, dolly in/out, tracking, handheld, crane, steadicam
Example full prompt:
"MS, 35mm, eye-level, slow dolly-in on LENA seated by a stone window at sunset.
Warm key light from window, blue fill. Shallow depth of field. Handheld."Use these three in order. Each builds on the one before.
In one paragraph, teach me the three axes of cinematic shot prompt vocabulary — shot size, angle, focal length — and why specificity here beats vague 'cinematic' descriptors.
Walk me through why a 35mm wide on a subject feels different from an 85mm close-up of the same subject — the optics, the perspective compression, the emotional register — and how AI video models have learned these associations.
I'm shooting a dialogue scene between two characters: ideally I want coverage (wide, two close-ups, an OTS each direction). Walk me through the prompts for each shot, how I maintain character consistency across them, and how I cut between them in the edit to maintain the 180-degree line.