Capstok — learn by doing

Why this matters

Optimising an API without a baseline measurement is not engineering — it is guessing with extra steps. A benchmark script captures a latency distribution (p50/p95/p99) before any change, creating the reference point against which every future fix is compared. Without it, a 'performance improvement' that cuts average latency from 60ms to 55ms might be within noise — or it might have been caused by a traffic drop rather than your code change. Benchmarks also reveal whether slowness is consistent (an infrastructure or algorithmic problem) or spiky (GC pause, cold start, connection pool exhaustion) — a distinction that points to completely different fixes.

Demo

Latency distributions reveal patterns invisible to single-sample measurements: bimodal distributions often indicate cache hit vs miss paths; long tails suggest GC pauses or connection pool starvation; uniform distributions point to a fixed per-request cost. Running sequential requests and grouping them into percentile bands produces a rough histogram that can identify these shapes without requiring a full observability stack.

Try it yourself

Run your benchmark against https://httpbin.org/get with n=100 and record the output. Then run it again immediately — do the numbers change? Explain why p99 might vary between runs even against the same endpoint.
Add --delay support to the script (introduce a setTimeout/sleep of 0–100ms randomly on 5% of requests) to simulate a realistic tail. Observe how p99 jumps while p50 barely moves.
Run the benchmark before and after adding a 'warm-up' phase (first 5 requests discarded from the stats). Compare the p50 — the warm-up removes TCP/TLS setup cost from the distribution. How big is the effect on your target endpoint?

Prompt your AI

Use these three in order. Each builds on the one before.

1. Basics & terminology

In one paragraph, explain why benchmarking with sequential requests (one at a time) differs from a load test (concurrent requests) and when to use each.

2. Why it works (the mechanism)

Why does the first request in a benchmark typically show higher latency than subsequent ones? List every warm-up effect that could inflate the first sample.

3. Advanced — application & what's next

I ran my benchmark before and after a 'caching improvement'. Before: p99=840ms. After: p99=820ms. My sample size was 50 requests. Is this improvement real? How many samples would I need to be 95% confident the improvement is at least 50ms?

References

// main.go — minimal latency benchmark with p50/p95/p99 output
package main

import (
	"flag"
	"fmt"
	"io"
	"net/http"
	"sort"
	"time"
)

func pct(sorted []float64, p float64) float64 {
	return sorted[int(p/100*float64(len(sorted)-1))]
}

func main() {
	n := flag.Int("n", 100, "number of requests")
	url := flag.String("url", "https://httpbin.org/get", "target URL")
	flag.Parse()

	client := &http.Client{Timeout: 10 * time.Second}
	var ms []float64

	for i := 0; i < *n; i++ {
		t0 := time.Now()
		resp, err := client.Get(*url)
		if err != nil { fmt.Println("err:", err); continue }
		io.Copy(io.Discard, resp.Body)
		resp.Body.Close()
		ms = append(ms, float64(time.Since(t0).Milliseconds()))
	}

	sort.Float64s(ms)
	var sum float64
	for _, v := range ms { sum += v }

	fmt.Printf("n=%d  url=%s\n", *n, *url)
	fmt.Printf("avg=%.1fms  p50=%.0fms  p95=%.0fms  p99=%.0fms  max=%.0fms\n",
		sum/float64(len(ms)), pct(ms,50), pct(ms,95), pct(ms,99), ms[len(ms)-1])
}
// Run: go run main.go -n 200 -url https://YOUR_API/health

Run: go run main.go

Your First Latency Benchmark — Measure Before You Fix