Open this lesson in your favourite AI. It'll walk you through the why, explain the demo, and quiz you on the try-it list.
Average latency is the most dangerous metric in API performance because it hides tail misery behind a reassuring number. If 990 requests finish in 10 ms and 10 requests take 5,000 ms, the average is 59 ms — a number that sounds healthy while one percent of users wait five seconds. Percentiles fix this: p50 (the median) tells you the experience of your typical user; p95 tells you about the 1 in 20 who are struggling; p99 shows you your worst 1 in 100. SLOs are always written in percentiles — 'p99 < 500 ms over a rolling 30-day window' — because an average SLO can be gamed by a workload dominated by trivial fast requests while the complex ones time out.
curl's -w flag gives you a complete per-phase HTTP waterfall directly from the terminal — DNS lookup, TCP connect, TLS handshake, server wait (TTFB), and total transfer, all labelled and timed. Because it runs anywhere you have a shell, it is the fastest way to confirm whether a slowdown is geographic, network-layer, or application-side, without touching production code or waiting for a dashboard.
rand.Intn(100) == 0 / random.random() < 0.01) and observe how the average barely moves while p99 remains elevated.Use these three in order. Each builds on the one before.
In one paragraph, explain what a percentile is and why p99 latency matters more than average latency for API monitoring.
Explain how HDR Histogram works and why it's preferred over naive sorted-array percentile computation in high-throughput production systems.
My team's SLO is 'p99 < 200 ms'. I see average latency of 45 ms and p99 of 190 ms. A new feature is about to add 20 ms of server-side work. Walk me through how to estimate whether we'll breach the SLO and what to instrument before shipping.
// main.go — measure 200 requests and compare avg vs percentiles
package main
import (
"fmt"
"io"
"math/rand"
"net/http"
"sort"
"time"
)
func percentile(sorted []float64, p float64) float64 {
idx := int(p/100*float64(len(sorted)-1))
return sorted[idx]
}
func main() {
const n = 200
var ms []float64
client := &http.Client{}
for i := 0; i < n; i++ {
start := time.Now()
// Mix fast and slow endpoints to create a realistic tail
var url string
if rand.Intn(10) == 0 {
url = "https://httpbin.org/delay/0.5" // 10% slow requests
} else {
url = "https://httpbin.org/get"
}
resp, _ := client.Get(url)
io.Copy(io.Discard, resp.Body)
resp.Body.Close()
ms = append(ms, float64(time.Since(start).Milliseconds()))
}
sort.Float64s(ms)
sum := 0.0
for _, v := range ms { sum += v }
fmt.Printf("n=%d requests\n", n)
fmt.Printf("Average: %.1f ms ← deceptively low\n", sum/float64(n))
fmt.Printf("p50: %.0f ms\n", percentile(ms, 50))
fmt.Printf("p95: %.0f ms\n", percentile(ms, 95))
fmt.Printf("p99: %.0f ms\n", percentile(ms, 99))
}go run main.go