The device plugin model

easy

Learn with your AI

Open this lesson in your favourite AI. It'll walk you through the why, explain the demo, and quiz you on the try-it list.

Open in Claude Open in ChatGPT

Why this matters

The device plugin is the bridge between vendor hardware and the kubelet, and it follows a precise contract: a small DaemonSet pod runs on each node, registers with the kubelet over a gRPC socket, lists the GPUs it found, and then allocates specific devices to pods at start time. Understanding this contract demystifies a whole class of failures — a missing socket, a crashed plugin pod, a driver mismatch — that otherwise present as 'my GPU pod is stuck Pending forever'. Once you know the plugin is the thing advertising and allocating devices, you know exactly where to look when GPUs vanish from a node.

Demo

The device plugin runs as a DaemonSet in kube-system (or the GPU operator's namespace). Its pod registers via a Unix socket under /var/lib/kubelet/device-plugins/ and reports the node's GPUs. If that pod is unhealthy, the node's GPU count drops to zero.

Try it yourself

Locate the device plugin DaemonSet and confirm one pod is running per GPU node.
Read a plugin pod's logs and find the 'Registered device plugin' line proving it talked to the kubelet.
Delete a device plugin pod and watch the node's nvidia.com/gpu count briefly drop, then recover when the DaemonSet recreates it.
Confirm the plugin pod runs on GPU nodes only (check its nodeSelector or the operator's node labels).

Prompt your AI

Use these three in order. Each builds on the one before.

1. Basics & terminology

In one paragraph, explain what the NVIDIA device plugin is and why Kubernetes needs it to use GPUs.

2. Why it works (the mechanism)

Walk me through the device plugin lifecycle: how it registers with the kubelet, lists devices, and allocates a specific GPU to a pod at startup.

3. Advanced — application & what's next

A GPU node suddenly shows nvidia.com/gpu: 0 while the GPUs are physically fine. Using the device plugin model, give me an ordered debugging plan.