Open this lesson in your favourite AI. It'll walk you through the why, explain the demo, and quiz you on the try-it list.
Most RAG projects fail in week 3 because they were scoped too broadly and never had clear success criteria. Define scope as one specific use case ('answer engineering FAQ from internal wiki') and one quantitative bar ('recall@5 ≥ 0.85 on a 50-case eval, end-to-end latency p99 < 2s, 0.01'). Pick a real dataset you have access to — your team's docs, public Wikipedia subset, an open dataset. Without a concrete dataset + bar, you'll endlessly tune.
The right shape: 100-1000 documents, 1-3 doc types, 30-100 representative queries with known good answers. Define what 'good' means concretely (which chunks should be retrieved, what facts should appear in the answer). Spend Day 1 on this; everything else flows from it. The eval set is your spec — if you can't define it, you don't know what you're building.
Use these three in order. Each builds on the one before.
Why is scope + success criteria the most important Day-1 artifact?
Walk me through writing eval cases: what fields, sourced from where, how many to start.
Design a spec for a 'compliance-Q&A' RAG over 5000 policy docs. What's the success bar, what's out of scope?
# project_spec.yaml
project: hybrid_rag_internal_docs
scope: |
Answer engineering-team questions from our internal wiki. Specifically:
- product architecture questions
- infra runbook lookups
- "how do I do X" common ops tasks
success_criteria:
retrieval_recall_at_5: 0.85 # measured on 50-case eval set
answer_accuracy: 0.80 # LLM-judge graded
p99_latency_ms: 2000
cost_per_query_usd: 0.01
dataset:
source: internal-wiki-export
doc_count: ~420
formats: [markdown, html]
total_size_mb: 28
eval_set:
path: ./evals/golden.jsonl
case_count: 50
format: |
{id, question, expected_chunk_ids: [...], expected_answer: "..."}
out_of_scope:
- personal advice
- questions outside engineering domain
- real-time queries (uses external data)
# Day-1 deliverable: this spec + the dataset + at least 20 cases in the eval set.
# Without this, you'll be tuning forever.python3 main.py