Take a realistic multi-model fleet (yours or the course example) and produce a serving requirements document: the model inventory with frameworks/SLAs/memory, a GPU-sharing plan, a versioning strategy, and a justified build-vs-buy recommendation among roll-your-own, Triton, and NIM. This is the planning artifact every later module builds on.
A 1-2 page requirements doc whose decision matrix output and GPU plan are internally consistent (e.g. 'Triton, because fleet spans 4 frameworks and we need version canaries; chat-llm gets a dedicated 80GB GPU, the 4 small models share a second one with rate limits on the bulk embedder').