Open this lesson in your favourite AI. It'll walk you through the why, explain the demo, and quiz you on the try-it list.
Optimising an API without a baseline measurement is not engineering — it is guessing with extra steps. A benchmark script captures a latency distribution (p50/p95/p99) before any change, creating the reference point against which every future fix is compared. Without it, a 'performance improvement' that cuts average latency from 60ms to 55ms might be within noise — or it might have been caused by a traffic drop rather than your code change. Benchmarks also reveal whether slowness is consistent (an infrastructure or algorithmic problem) or spiky (GC pause, cold start, connection pool exhaustion) — a distinction that points to completely different fixes.
Latency distributions reveal patterns invisible to single-sample measurements: bimodal distributions often indicate cache hit vs miss paths; long tails suggest GC pauses or connection pool starvation; uniform distributions point to a fixed per-request cost. Running sequential requests and grouping them into percentile bands produces a rough histogram that can identify these shapes without requiring a full observability stack.
https://httpbin.org/get with n=100 and record the output. Then run it again immediately — do the numbers change? Explain why p99 might vary between runs even against the same endpoint.--delay support to the script (introduce a setTimeout/sleep of 0–100ms randomly on 5% of requests) to simulate a realistic tail. Observe how p99 jumps while p50 barely moves.Use these three in order. Each builds on the one before.
In one paragraph, explain why benchmarking with sequential requests (one at a time) differs from a load test (concurrent requests) and when to use each.
Why does the first request in a benchmark typically show higher latency than subsequent ones? List every warm-up effect that could inflate the first sample.
I ran my benchmark before and after a 'caching improvement'. Before: p99=840ms. After: p99=820ms. My sample size was 50 requests. Is this improvement real? How many samples would I need to be 95% confident the improvement is at least 50ms?
// main.go — minimal latency benchmark with p50/p95/p99 output
package main
import (
"flag"
"fmt"
"io"
"net/http"
"sort"
"time"
)
func pct(sorted []float64, p float64) float64 {
return sorted[int(p/100*float64(len(sorted)-1))]
}
func main() {
n := flag.Int("n", 100, "number of requests")
url := flag.String("url", "https://httpbin.org/get", "target URL")
flag.Parse()
client := &http.Client{Timeout: 10 * time.Second}
var ms []float64
for i := 0; i < *n; i++ {
t0 := time.Now()
resp, err := client.Get(*url)
if err != nil { fmt.Println("err:", err); continue }
io.Copy(io.Discard, resp.Body)
resp.Body.Close()
ms = append(ms, float64(time.Since(t0).Milliseconds()))
}
sort.Float64s(ms)
var sum float64
for _, v := range ms { sum += v }
fmt.Printf("n=%d url=%s\n", *n, *url)
fmt.Printf("avg=%.1fms p50=%.0fms p95=%.0fms p99=%.0fms max=%.0fms\n",
sum/float64(len(ms)), pct(ms,50), pct(ms,95), pct(ms,99), ms[len(ms)-1])
}
// Run: go run main.go -n 200 -url https://YOUR_API/healthgo run main.go