Open this lesson in your favourite AI. It'll walk you through the why, explain the demo, and quiz you on the try-it list.
The reason your server scales (or doesn't) is almost always about how it handles concurrency. Go uses lightweight goroutines scheduled onto OS threads. Python historically fights the GIL — modern Python uses asyncio and workers. Rust uses native OS threads or an async runtime (Tokio). Node runs a single-threaded event loop with worker threads for CPU work. Each model has a sweet spot and a failure mode. Knowing which one you're working in is the first step to knowing how many connections your laptop can actually handle.
A handler that sleeps for 100ms, under 50 concurrent requests, exposes the difference between threading, async, and the event loop — not theoretically but numerically. Go schedules goroutines onto threads and handles them all; Python's asyncio suspends coroutines cooperatively; Node's event loop queues them; Rust's Tokio does the same with zero-cost futures. The throughput numbers you see are why these models exist.
wrk (brew install wrk on mac). Run your server, then: wrk -c100 -t4 -d10s http://localhost:8080/slow. Note the req/sec.-c10000 connections. The numbers hold. In Python/Node with a blocking handler, try the same — the server likely hangs.ps -M <pid> on mac, top -H -p <pid> on linux. Surprising?Use these three in order. Each builds on the one before.
Explain Go's goroutines, Python's asyncio, Rust's Tokio, and Node's event loop as concurrency models. Where does each put its concurrency — threads, tasks, callbacks?
A single Go process on a 4-core machine can run tens of thousands of goroutines. How? Walk me through the Go runtime's M:N scheduler — how goroutines map to OS threads, how the scheduler preempts, and why the cost of a goroutine is measured in bytes, not kilobytes.
When your service spends 95% of its time on I/O (DB, downstream HTTP), any async model works. When it spends 95% on CPU, the model matters a lot. Pick a realistic service (image resizer, data transformer, ML inference) and explain which concurrency model is wrong for it, and why.
// go routines: preemptive, lightweight, 2KB stacks
package main
import (
"fmt"
"net/http"
"sync/atomic"
"time"
)
var inflight atomic.Int64
func main() {
http.HandleFunc("/slow", func(w http.ResponseWriter, r *http.Request) {
n := inflight.Add(1)
defer inflight.Add(-1)
time.Sleep(100 * time.Millisecond)
fmt.Fprintf(w, "done (inflight: %d)\n", n)
})
http.ListenAndServe(":8080", nil)
}
// wrk -c100 -t4 -d10s http://localhost:8080/slow
// -> ~950 req/sec, inflight climbs to ~100go run main.go