Where the first bottleneck appears

hard

Learn with your AI

Open this lesson in your favourite AI. It'll walk you through the why, explain the demo, and quiz you on the try-it list.

Open in Claude Open in ChatGPT

Why this matters

Every server starts fast and ends slow. The first bottleneck on a real service is almost never CPU — it's a synchronous call to something slow. The database. A downstream HTTP call. A file read. A synchronous log write. Your handler waits 50ms on the DB, your server's capacity just became '1000 / (50 / 1000) = 20 req/s per connection' for that endpoint. The art of scaling is finding those synchronous calls and deciding: parallelize, cache, async, or eliminate.

Demo

The first bottleneck on any real service is almost never CPU — it's a synchronous call to something slow: a database, a downstream HTTP request, a file read. When your handler blocks for 50ms waiting on the DB, your server's per-connection throughput becomes 20 req/s no matter how many CPUs you have. Profiling that one path shows exactly where time is spent and why the fix is async, not a bigger machine.

Try it yourself

Run the server. Curl it a few times. Look at the X-Timing-MS response header. Which step is dominating the time?

Load test with wrk -c100 -d10s http://localhost:8080/users/42. Watch p99 latency.

Add a second synchronous operation — a tiny 5ms time.Sleep / setTimeout — inside the handler. Load test again. Predicted what happens?

In Python/Node (blocking DB), your event loop stalls. In Go/Rust, goroutines/tasks park. Which symptom shows up (higher latency vs lower throughput) in each?

Name one realistic optimization for this handler — caching, prepared statements, connection pooling, async DB driver. Pick one, try it, compare.

Prompt your AI

Use these three in order. Each builds on the one before.

1. Basics & terminology

Explain the phrase 'the first bottleneck is always I/O, not CPU' with a concrete example. Why does a 50ms database call dominate a handler that also does 500µs of JSON parsing?

2. Why it works (the mechanism)

Walk me through what happens to your process when a handler makes a synchronous DB call: syscall, waiting, return. In an event-loop runtime (Node/Python asyncio), why does a blocking call freeze everything, while an async call doesn't?

3. Advanced — application & what's next

You have a handler that does A (10ms DB), B (20ms HTTP), C (50ms DB). All three are independent. Before any refactor, what's the handler's minimum possible latency and what's its actual? Walk me through the three techniques to bring actual closer to minimum: parallelization, pipelining, caching.