Capstok — learn by doing

Why this matters

Average latency is the most dangerous metric in API performance because it hides tail misery behind a reassuring number. If 990 requests finish in 10 ms and 10 requests take 5,000 ms, the average is 59 ms — a number that sounds healthy while one percent of users wait five seconds. Percentiles fix this: p50 (the median) tells you the experience of your typical user; p95 tells you about the 1 in 20 who are struggling; p99 shows you your worst 1 in 100. SLOs are always written in percentiles — 'p99 < 500 ms over a rolling 30-day window' — because an average SLO can be gamed by a workload dominated by trivial fast requests while the complex ones time out.

Demo

curl's -w flag gives you a complete per-phase HTTP waterfall directly from the terminal — DNS lookup, TCP connect, TLS handshake, server wait (TTFB), and total transfer, all labelled and timed. Because it runs anywhere you have a shell, it is the fastest way to confirm whether a slowdown is geographic, network-layer, or application-side, without touching production code or waiting for a dashboard.

Try it yourself

Run the demo and note the gap between the average and p99. Now change the slow-request fraction from 10% to 1% (rand.Intn(100) == 0 / random.random() < 0.01) and observe how the average barely moves while p99 remains elevated.
Change the slow delay from 0.5s to 2s. Re-run. How much does the average climb? How much does p99 climb? Compute the ratio — this shows why p99 is a more sensitive signal.
Write a quick function that computes p999 (99.9th percentile). For 200 samples that's the 200th value — but explain why you need thousands of samples before p999 is statistically meaningful.

Prompt your AI

Use these three in order. Each builds on the one before.

1. Basics & terminology

In one paragraph, explain what a percentile is and why p99 latency matters more than average latency for API monitoring.

2. Why it works (the mechanism)

Explain how HDR Histogram works and why it's preferred over naive sorted-array percentile computation in high-throughput production systems.

3. Advanced — application & what's next

My team's SLO is 'p99 < 200 ms'. I see average latency of 45 ms and p99 of 190 ms. A new feature is about to add 20 ms of server-side work. Walk me through how to estimate whether we'll breach the SLO and what to instrument before shipping.

References

// main.go — measure 200 requests and compare avg vs percentiles
package main

import (
	"fmt"
	"io"
	"math/rand"
	"net/http"
	"sort"
	"time"
)

func percentile(sorted []float64, p float64) float64 {
	idx := int(p/100*float64(len(sorted)-1))
	return sorted[idx]
}

func main() {
	const n = 200
	var ms []float64
	client := &http.Client{}

	for i := 0; i < n; i++ {
		start := time.Now()
		// Mix fast and slow endpoints to create a realistic tail
		var url string
		if rand.Intn(10) == 0 {
			url = "https://httpbin.org/delay/0.5" // 10% slow requests
		} else {
			url = "https://httpbin.org/get"
		}
		resp, _ := client.Get(url)
		io.Copy(io.Discard, resp.Body)
		resp.Body.Close()
		ms = append(ms, float64(time.Since(start).Milliseconds()))
	}

	sort.Float64s(ms)
	sum := 0.0
	for _, v := range ms { sum += v }

	fmt.Printf("n=%d requests\n", n)
	fmt.Printf("Average: %.1f ms  ← deceptively low\n", sum/float64(n))
	fmt.Printf("p50:     %.0f ms\n", percentile(ms, 50))
	fmt.Printf("p95:     %.0f ms\n", percentile(ms, 95))
	fmt.Printf("p99:     %.0f ms\n", percentile(ms, 99))
}

Run: go run main.go

The Lie of Averages — p50, p95, and p99 Percentiles