Open this lesson in your favourite AI. It'll walk you through the why, explain the demo, and quiz you on the try-it list.
The thing that turns model serving from artisanal to industrial is a standardized layout: a model repository where every model is a directory with a known structure — config, versions, weights — that the server discovers and loads automatically. Once deployment is 'drop a correctly shaped folder into the repository,' you get reproducibility, code review, CI, and rollback for free, because deploying a model becomes a file operation under version control. This is the same shift that container images brought to apps. Understanding the model-repository convention is the foundation for everything in Triton, and it's why this course keeps coming back to config.pbtxt and versioned directories.
The demo lays out a Triton-style model repository as a directory tree, showing that each model is self-describing: a config plus numbered version folders holding the actual weights.
Use these three in order. Each builds on the one before.
What is a model repository in an inference server, and why does a standardized directory layout matter for deployment?
Walk me through how a server discovers and loads models from a conventionally-structured repository of config files and versioned weight folders.
Given a model repository under version control, how would I design CI/CD so that adding or updating a model directory is a safe, reviewable, revertible deployment?
# A Triton model repository is just a conventionally-shaped directory tree.
model_repository/
embeddings/
config.pbtxt # the model's contract: backend, inputs, outputs, batching
1/ # version 1
model.onnx
2/ # version 2 (newer); server can serve both or the latest
model.onnx
chat-llm/
config.pbtxt
1/
model.engine # a TensorRT-LLM engine, for example
reranker/
config.pbtxt
1/
model.pt # a TorchScript / PyTorch model
# Deploying = adding/updating a directory under version control. That's the whole idea.
find model_repository -maxdepth 2 -type d