Stop conditions and max tokens

medium

Learn with your AI

Open this lesson in your favourite AI. It'll walk you through the why, explain the demo, and quiz you on the try-it list.

Open in Claude Open in ChatGPT

Why this matters

Generation has to stop somewhere, and how it stops is a serving concern with cost and correctness implications. The model stops on an end-of-sequence token, a max-token limit, or a custom stop string — and each has gotchas. A too-low max_tokens truncates answers mid-sentence (finish_reason: length); a missing stop sequence lets the model ramble (wasting tokens and money); a stop string that appears in legitimate output cuts answers short. Controlling stop conditions well is how you bound cost per request and avoid the truncation failures from the error taxonomy.

Demo

The demo shows the three stop mechanisms and the finish_reason that tells you which fired — essential for handling truncation (raise max_tokens or continue) vs. clean completion.

Try it yourself

Trigger a 'max_tokens' stop and confirm the answer is truncated; handle it instead of returning garbage.
Add a stop_sequence and confirm generation halts when the model emits it (saving tokens).
Set a stop string that also appears in legitimate content and watch answers get cut short — a real bug.
Estimate the cost difference between an unbounded ramble and a properly stop-bounded response.

Prompt your AI

Use these three in order. Each builds on the one before.

1. Basics & terminology

How does LLM generation decide to stop, and what are the three stop mechanisms?

2. Why it works (the mechanism)

Explain end-of-sequence tokens, max_tokens limits, and stop sequences, and how finish_reason tells me which fired and what to do about it.

3. Advanced — application & what's next

Design robust stop-condition handling for a production endpoint: choosing max_tokens, safe stop sequences, detecting and handling truncation (continue vs. surface partial), and bounding per-request cost.