Open this lesson in your favourite AI. It'll walk you through the why, explain the demo, and quiz you on the try-it list.
A service that 'feels slow' is not fixable — there is no test for feelings, no alert that fires on feelings, and no way to declare victory against feelings. A Service Level Objective (SLO) converts subjective discomfort into a measurable, alertable target: 'p99 latency < 300 ms, measured over a rolling 7-day window.' This single sentence defines what fast means for your service, what metric to instrument, at what threshold to page the on-call engineer, and when a performance improvement is officially done. Without an SLO, every optimisation is driven by whoever complains loudest rather than by data — and performance work never ends because 'done' is never defined.
SLOs are written as percentile budgets over rolling windows — "p99 < 500 ms over the last 30 days" — not as instantaneous snapshots. Evaluating a rolling window against a threshold shows exactly how many budget-minutes remain before a breach, which drives prioritization: a service burning budget at 2x the safe rate needs attention today, not at the next quarterly review. Implementing this evaluation on raw latency samples is the conceptual foundation of every error-budget dashboard.
rand.Float64() < 0.02). At what fraction does the SLO breach? This is your 'error budget' boundary.Use these three in order. Each builds on the one before.
In one paragraph, explain what an SLO is, how it differs from an SLA, and why error budgets matter.
Walk me through how Google's SRE book defines error budgets and how a team should respond when 50%, 75%, and 100% of the budget is consumed.
My API has two user-facing endpoints: /search (called 10× per session, p99 target 200ms) and /checkout (called once per purchase, p99 target 1000ms). How would you design separate SLOs for each, and how would you roll them up into a single service-level indicator for an executive dashboard?
// main.go — evaluate an SLO against a latency sample window
package main
import (
"fmt"
"math/rand"
"sort"
"time"
)
const (
sloPercentile = 99 // p99
sloThresholdMs = 300 // must be < 300ms
sloWindowSize = 1000 // samples in the rolling window
)
func generateSamples(n int) []float64 {
rand.Seed(time.Now().UnixNano())
var ms []float64
for i := 0; i < n; i++ {
var v float64
if rand.Float64() < 0.01 { // 1% slow requests
v = 200 + rand.Float64()*600 // 200-800ms
} else {
v = 30 + rand.Float64()*80 // 30-110ms
}
ms = append(ms, v)
}
return ms
}
func evalSLO(samples []float64) {
s := make([]float64, len(samples))
copy(s, samples)
sort.Float64s(s)
idx := int(float64(sloPercentile)/100*float64(len(s)-1))
p99 := s[idx]
status := "✅ COMPLIANT"
if p99 >= float64(sloThresholdMs) {
status = "❌ BREACHED"
}
fmt.Printf("Window: %d samples\n", len(s))
fmt.Printf("p%d: %.1f ms (threshold: < %d ms)\n",
sloPercentile, p99, sloThresholdMs)
fmt.Printf("SLO: %s\n", status)
burned := 0
for _, v := range s {
if v >= float64(sloThresholdMs) { burned++ }
}
fmt.Printf("Budget: %d/%d requests burned (%.1f%% of error budget)\n",
burned, len(s), float64(burned)/float64(len(s))*100)
}
func main() { evalSLO(generateSamples(sloWindowSize)) }go run main.go