Project · Map your serving requirements and pick a stack

project

hard

module project

Ship something real. Submit your work when you're done.

Brief

Take a realistic multi-model fleet (yours or the course example) and produce a serving requirements document: the model inventory with frameworks/SLAs/memory, a GPU-sharing plan, a versioning strategy, and a justified build-vs-buy recommendation among roll-your-own, Triton, and NIM. This is the planning artifact every later module builds on.

Deliverables

A model inventory table: name, framework, GPU memory, latency SLA, and traffic class for at least 6 models.
A GPU-sharing plan stating which models co-locate, on what device size, and how the latency-critical ones are protected.
A versioning + rollback strategy (version policy and canary plan) written as a short runbook.
A build-vs-buy recommendation (roll-your-own / Triton / NIM) with a weighted decision matrix justifying it.

How we grade it

The inventory has at least 6 models spanning at least 3 frameworks with concrete SLA numbers, not placeholders.
The GPU-sharing plan fits within stated device memory and explicitly names how the SLA-critical model is protected (priority/rate limit/dedicated device).
The versioning strategy specifies a canary fraction and an explicit, measurable rollback trigger.
The build-vs-buy recommendation follows from the weighted matrix and names the single deciding factor.

Project · Map your serving requirements and pick a stack

Hints

Expected output

Stretch goals