Scope, success criteria, dataset

easy

Learn with your AI

Open this lesson in your favourite AI. It'll walk you through the why, explain the demo, and quiz you on the try-it list.

Open in Claude Open in ChatGPT

Why this matters

Most RAG projects fail in week 3 because they were scoped too broadly and never had clear success criteria. Define scope as one specific use case ('answer engineering FAQ from internal wiki') and one quantitative bar ('recall@5 ≥ 0.85 on a 50-case eval, end-to-end latency p99 < 2s, $/query <$ 0.01'). Pick a real dataset you have access to — your team's docs, public Wikipedia subset, an open dataset. Without a concrete dataset + bar, you'll endlessly tune.

Demo

The right shape: 100-1000 documents, 1-3 doc types, 30-100 representative queries with known good answers. Define what 'good' means concretely (which chunks should be retrieved, what facts should appear in the answer). Spend Day 1 on this; everything else flows from it. The eval set is your spec — if you can't define it, you don't know what you're building.

Try it yourself

Write a 1-page project spec like above. Include scope, success bar, dataset, eval set sketch.
Curate your real dataset. 100-1000 docs is the sweet spot for a first project.
Write your first 20 eval cases yourself (or with a teammate who knows the docs). Don't let an LLM write them.
Define 'out of scope' explicitly. Saying NO to scope creep is the most important skill.

Prompt your AI

Use these three in order. Each builds on the one before.

1. Basics & terminology

Why is scope + success criteria the most important Day-1 artifact?

2. Why it works (the mechanism)

Walk me through writing eval cases: what fields, sourced from where, how many to start.

3. Advanced — application & what's next

Design a spec for a 'compliance-Q&A' RAG over 5000 policy docs. What's the success bar, what's out of scope?