Agentic and Applied AI / Course

MLOps & LLMOps: The Production Lifecycle

Run ML and LLM systems in production the way it's actually done: experiment tracking, model/prompt registries with lineage, CI/CD with eval gates, safe deployment patterns, automated LLM evals, observability, guardrails, and a continuous improvement loop. The discipline that turns a working demo into a system you can trust at scale.

Free preview

Certificate: 1 of 5 capstones

Ten modules, ~100 challenges on the operational discipline behind reliable ML and LLM products. Python-first and 2026-current: it bridges classic MLOps (MLflow experiment tracking, DVC data versioning, Airflow/Dagster pipelines, model registries, GitHub Actions CI/CD, canary and blue/green deploys) to LLMOps, where for many apps you never train a model and the real artifacts are prompts, RAG configs, and agents. You'll build automated LLM evals (golden sets, LLM-as-judge done right, regression gates, evals that survive rate limits), production observability with Langfuse-style tracing and drift detection, a guardrail layer with red-team tests, and finally close the lifecycle loop end to end. Every module ships runnable code and a project; the through-line is treating prompts and models as versioned, tested, monitored artifacts instead of strings you edit and pray over.

Built by Lakshya Kumar

mlops

llmops

evals

observability

ci-cd

model-registry

guardrails

Before you start4 items

Comfortable in Python; you've trained or called a model before.
Basic git and CI familiarity (you've opened a pull request and seen a workflow run).
You've built a simple ML or LLM app you can wrap with MLOps practices.
An API key (free tier is fine — the eval and observability modules teach you to work within rate limits).

Is this course for you?Ask an AI

Paste this into any AI chat. Fill in the bracketed parts with your context — you'll get back a straight answer on whether this belongs on your plate.

Get access to MLOps & LLMOps: The Production Lifecycle

$3.99

30-day access

Prefer the whole catalog? See all-access membership.

Ask for access

We grant free access case-by-case — students, career-switchers, builders on a tight budget. Sign in to send us a note.

Capstone projects

Submit any 1 of 5 to earn the certificate

Complete all modules, then submit the required number of capstone projects. Each must earn a passing rating from an admin reviewer.

capstoneAn end-to-end production lifecycle for one feature

Take one real ML model or LLM feature all the way around the loop: register it (with a model card, provenance, and lineage), gate it with an automated eval suite in CI, deploy it via canary behind feature flags with automated rollback, wrap it in an observability + guardrail layer, and write the runbooks. Submit the repo, a CI run showing the eval gate, a deployment + rollback demo, an observability dashboard, and the runbooks.

Submit production lifecycleMinimum rating for approval: 3/5

llm-eval-packAn automated LLM eval pack with a CI gate

Further reading & study material6 sources

Prompt

I'm taking an "MLOps & LLMOps: The Production Lifecycle" course — the operational discipline behind reliable ML and LLM systems. It covers experiment tracking (MLflow/W&B), data & feature pipelines (DVC/Airflow), model & prompt registries with lineage, CI/CD with eval gates (GitHub Actions), deployment patterns (shadow/canary/blue-green/A-B), automated LLM evals (golden sets, LLM-as-judge, regression gates), production observability (Langfuse tracing, drift detection), guardrails & safety (injection defense, PII, moderation), and closing the continuous loop. Python-first, bridging classic MLOps to 2026 LLMOps.

Here's my context:
1. What I'm building/operating: [describe the ML model or LLM feature/product]
2. Do I train/fine-tune anything, or is every "model" a hosted API call? [train / fine-tune / pure API / mix]
3. My current maturity: [level 0 notebook + manual deploy / level 1 some automation / level 2 full CI-CD]
4. Where it hurts most: [can't reproduce results / no evals / regressions ship silently / drift / cost / safety / slow iteration]

Given that, answer:
- Which module should I prioritize first and why, given my maturity level?
- Which is my single highest-leverage gap (tracking / data versioning / registry / CI eval gate / observability / guardrails / the loop)?
- Name 3 concrete changes I could make this week, and how I'd measure that each one helped.
- Name 1 thing this course won't fix so I have the right expectations.

Build a reusable eval pack another team could drop onto their LLM app: a versioned golden dataset, reference-based + reference-free checks, a calibrated LLM-as-judge with bias controls, a resilient runner (concurrency caps, retries, checkpointing, dead-letter) that survives rate limits, and a CI regression gate. Submit it as a small package or repo with docs and a sample run.

Submit eval packMinimum rating for approval: 3/5

observability-drift-dashboardA production observability + drift dashboard

Instrument a real LLM app end to end: nested-span tracing, responsible logging (sampling + PII redaction), error-type classification, data/concept/output drift detection, online quality signals, and a dashboard with symptom-level alerts. Submit the live dashboard, a debugged trace from a failing request, and proof that a drift or cost spike fires the right alert.

Submit observability dashboardMinimum rating for approval: 3/5

guardrails-redteamA guardrails layer with red-team tests

Wrap an LLM feature in a complete guardrail layer (input sanitization, PII redaction, prompt-injection defense, content moderation, schema validation/repair, fallbacks + human-in-the-loop) and red-team it with attack tests for injections, PII leaks, malformed output, and policy violations. Submit the implementation and the red-team suite proving the guardrails hold.

Submit guardrails layerMinimum rating for approval: 3/5

iteration-loopAn end-to-end retraining/iteration loop

Close the lifecycle loop for a real system: production feedback feeds the golden/training set, an automated trigger drives iteration, the eval gate guards quality, a gated deploy ships behind guardrails, and observability feeds the next turn — plus a unified deployment manifest, cost governance with a spend cap, and a tested runbook. Submit a demo of one full automated loop turn and the manifest/runbook artifacts.

Submit iteration loopMinimum rating for approval: 3/5

The tracking + registry tool used across Modules 2, 4, and 5. Keep it open while you build.

MLOps & LLMOps: The Production Lifecycle

What MLOps Actually Is

Experiment Tracking & Reproducibility

Data & Feature Pipelines

Model Registries & Lineage

CI/CD for Models

Deployment Patterns

Automated LLM Evaluations

Production Observability

Guardrails & Safety in Production

The Continuous Lifecycle