Open this lesson in your favourite AI. It'll walk you through the why, explain the demo, and quiz you on the try-it list.
Before the GPU Operator automates everything, it is worth installing the bare device plugin once by hand, because it teaches you the prerequisites the operator otherwise hides: the NVIDIA driver must already be on the node, the NVIDIA container runtime must be configured as the default, and only then does the plugin DaemonSet have anything to advertise. Doing this manually once means that when the operator later breaks, you can reason about which of those three layers failed. This is the foundational install every other module assumes is working.
With the driver and container runtime already present on the node, installing the device plugin is a single DaemonSet apply. The DaemonSet lands on every node; on GPU nodes it discovers devices, on CPU-only nodes it finds none and idles.
Use these three in order. Each builds on the one before.
In one paragraph, what are the prerequisites for the NVIDIA device plugin to work, beyond just applying its DaemonSet?
Walk me through what has to be true on a node — driver, container runtime, plugin — for nvidia.com/gpu to appear, and the order those layers depend on each other.
Given the manual device plugin install, what failure modes does it expose that the GPU Operator would otherwise hide, and how does that knowledge help debugging later?
# nvidia-device-plugin.yml — apply with: kubectl apply -f nvidia-device-plugin.yml
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: nvidia-device-plugin-daemonset
namespace: kube-system
spec:
selector:
matchLabels:
name: nvidia-device-plugin-ds
template:
metadata:
labels:
name: nvidia-device-plugin-ds
spec:
tolerations:
- key: nvidia.com/gpu
operator: Exists
effect: NoSchedule
priorityClassName: system-node-critical
containers:
- name: nvidia-device-plugin-ctr
image: nvcr.io/nvidia/k8s-device-plugin:v0.17.0
securityContext:
privileged: true
volumeMounts:
- name: device-plugin
mountPath: /var/lib/kubelet/device-plugins
volumes:
- name: device-plugin
hostPath:
path: /var/lib/kubelet/device-plugins