Every level of RAG — from naive cosine-on-embeddings to hybrid + reranking + multi-hop + agentic + production DevOps. Each module is the next move when the previous one breaks.
Ten modules, ~100 challenges that take you from the 30-line naive RAG everyone ships first, through hybrid search, reranking, query rewriting, multi-hop, evaluation, citations, agentic RAG, scale, and the production DevOps that makes it real. Code in Python (the AI default) with Node alternatives where it matters. Built on what real production RAG looks like at 2026: Anthropic's Citations API, prompt caching, Contextual Retrieval, MCP, and a properly layered eval discipline.
Built by Lakshya Kumar
We grant free access case-by-case — students, career-switchers, builders on a tight budget. Sign in to send us a note.
Sign in to applyComplete all modules, then submit the required number of capstone projects. Each must earn a passing rating from an admin reviewer.
Pick a real corpus (your company docs, a public dataset like Wikipedia or arXiv, or your own knowledge base). Ship a production-quality RAG with: chunking dispatcher, hybrid + rerank, query rewriting, agentic mode for complex queries, citations, eval gate in CI, live monitoring, and the full production checklist. Submit the live URL + the metrics + the checklist.
Build an eval framework that another team could drop in: golden-set tooling, retrieval + generation metrics, LLM-as-judge with calibration, CI gate, online signal integration, dashboard. Submit the framework as a small npm/pypi package or repo.
Paste this into any AI chat. Fill in the bracketed parts with your context — you'll get back a straight answer on whether this belongs on your plate.
I'm taking a "RAG Systems" course that runs from naive RAG through hybrid + rerank, query rewriting, multi-hop, evals, citations, agentic RAG, scale, and production DevOps. It uses Python (with Node alternatives) and lots of real 2026 production tricks (Anthropic Citations API, prompt caching, Contextual Retrieval, MCP). Here's my context: 1. My current product/project is: [describe] 2. My current RAG state: [haven't built one / naive prototype / shipping / scaling] 3. My corpus: [size, doc types, growth rate] 4. Where I think RAG is failing me: [my guess] Given that, answer: - Which module should I prioritize, and why? - Name 3 concrete wins this course would unlock for my situation. - Name 1 thing the course won't help me with so I don't have wrong expectations. - If I only had 2 hours this week, which single technique gives me the biggest lift? How would I measure that it worked?
Build an agentic RAG that uses ≥3 tools (docs, structured DB, web). Include streaming, observability, retry/reformulate, citations. Submit the live demo + traces.
Scale your RAG to 100 QPS sustained: prompt caching, multi-layer caches, batched ingest, vector topology, per-tenant isolation, cost dashboard. Submit load-test results + cost numbers.
Pick a domain you know nothing about, build a RAG on it in 1 week. Document every wrong turn, every fix, every metric. Submit the writeup — it's the most honest learning artifact in this course.
Module 2 leans on this directly. Pair with the Citations API doc.