Capstok — learn by doing

Why this matters

For CPU and memory, Kubernetes lets you set requests (the guaranteed minimum) separately from limits (the ceiling), and the gap between them drives bin-packing and overcommit. GPUs break that model: because they are an extended resource, request and limit must be equal, and you effectively only set limits. Builders who carry their CPU intuition over to GPUs write specs that get rejected or, subtly, end up requesting more than they meant. Knowing that GPUs are an all-or-nothing, request-equals-limit resource is what keeps your QoS class and your scheduling predictable.

Demo

Try to set a GPU request smaller than its limit and the API server rejects it. Extended resources require request == limit, which also pins GPU pods to the Guaranteed QoS class.

Try it yourself

Apply the bad spec and read the exact API server rejection message about requests equaling limits.
Apply the good spec and confirm with kubectl get pod -o yaml that requests.nvidia.com/gpu was auto-populated to match limits.
Check the pod's QoS class with kubectl describe and confirm it is Guaranteed.
Reason about why you cannot overcommit GPUs the way you overcommit CPU by setting requests below limits.

Prompt your AI

Use these three in order. Each builds on the one before.

1. Basics & terminology

In one paragraph, why must GPU requests equal GPU limits in Kubernetes when CPU and memory don't have that rule?

2. Why it works (the mechanism)

Walk me through how Kubernetes handles requests vs limits for extended resources like nvidia.com/gpu, and how that determines the pod's QoS class.

3. Advanced — application & what's next

Given that GPUs can't be overcommitted via request/limit gaps, how do MIG and time-slicing reintroduce sharing, and what do they trade away compared to true overcommit?

References

# This spec is REJECTED: for extended resources, request must equal limit.
apiVersion: v1
kind: Pod
metadata:
  name: bad-gpu-request
spec:
  containers:
    - name: app
      image: nvcr.io/nvidia/cuda:12.5.0-base-ubuntu22.04
      command: ["sleep", "3600"]
      resources:
        requests:
          nvidia.com/gpu: 1   # mismatch ->
        limits:
          nvidia.com/gpu: 2   # ... API server: requests must equal limits

Limits vs requests for GPUs