Open this lesson in your favourite AI. It'll walk you through the why, explain the demo, and quiz you on the try-it list.
If you can't measure latency, you can't improve it. 'Average latency' is a lie — one 2-second request in a thousand 20ms requests drags the average by 2ms, but p99 jumps from 20ms to 2000ms. What matters is the distribution: p50, p90, p95, p99, p99.9. You need a load-testing tool that reports all of them. wrk is the classic; hey is a friendlier version; oha is the modern rust rewrite with a pretty UI. Learning to read a percentile distribution is half the job.
Load-testing tools differ in how they model concurrency, measure latency, and report percentiles — so the same server produces different-looking numbers depending on which tool you choose. Running hey, oha, and wrk against the same endpoint makes the differences concrete: request rate, connection model, and histogram bucketing are all configurable, and each tool's defaults make different trade-offs. Calibrating with all three means you trust the numbers you report.
oha (cargo install oha, or brew install oha). Run it against your server. Look at the pretty TUI.wrk and hey for comparison. Note which gives you percentiles, which gives you histogram, which is easier to read.Use these three in order. Each builds on the one before.
Explain why 'average latency' is a misleading metric. Use a concrete example with a distribution of response times.
Walk me through how a load tester like wrk or oha actually works: it opens N concurrent connections, sends requests as fast as the server accepts them, records per-request latencies, and at the end computes percentiles. Where can the load tester itself become the bottleneck?
Your service has p50=10ms, p99=300ms, and you're told to fix the tail. The symptoms: most requests are fast, 1% are 30x slower. Give me a checklist of the 5 most likely causes, in order of how often they're the real answer (GC pause? DB lock? slow disk? retry storm? cold cache?).
# Install wrk: brew install wrk
# Install hey: brew install hey
# Install oha: brew install oha (or: cargo install oha)
# Hit your server (any language):
wrk -c100 -t4 -d10s http://localhost:8080/users/42
# Output:
# Running 10s test @ http://localhost:8080/users/42
# Thread Stats Avg Stdev Max +/- Stdev
# Latency 12.45ms 3.10ms 41.22ms 77.01%
# Requests/sec: 7942.18
hey -n 10000 -c 100 http://localhost:8080/users/42
# Output includes the ascii histogram of latencies.
oha -c 100 -z 10s http://localhost:8080/users/42
# Live TUI: requests/sec, status code breakdown, full percentile list.go run main.go