A server that works for 10 users breaks at 10,000. Rate limiting, caching, load balancing, queues — one module at a time.
A hundred challenges that take a working web server and turn it into something that survives real load. You'll write rate limiters, load balancers, caches, queues, and health checks — in Go, Python, Rust, or Node — and learn when each primitive is the right answer and when it's just extra complexity.
Built by Lakshya Kumar
We grant free access case-by-case — students, career-switchers, builders on a tight budget. Sign in to send us a note.
Sign in to applyComplete all modules, then submit the required number of capstone projects. Each must earn a passing rating from an admin reviewer.
Take the Module 1 server and add: rate limiting, caching, a metrics endpoint, and structured logging. Load-test it at 10k rps on your laptop. Submit the code, a dashboard screenshot (or text table) of key metrics, and a writeup explaining every scaling decision.
Choose a real (or representative) database approaching scale limits. Write a sharding design doc: shard key analysis, rebalancing plan, cross-shard query handling, migration plan with zero-downtime, and a rollback strategy. Include capacity math for projected 10x growth.
Overlaps with this course's distributed-leaning chapters. The single most-returned-to book on most backend teams.
Good for naming and comparing primitives. Pair with the load balancer and cache modules.
Paste this into any AI chat. Fill in the bracketed parts with your context — you'll get back a straight answer on whether this belongs on your plate.
I'm considering a "Scalable Systems" course. It starts from a working web server and adds, one module at a time: rate limiting, caching, load balancing, queueing, observability, database scaling, sharding, replication, degradation, and back-of-the-envelope capacity planning. Demos are in Go/Python/Rust/Node. Capstone: scale one service to 10k rps on a laptop. Context: 1. Where I sit today: [e.g. "mid-level backend at 100-person startup", "solo dev shipping SaaS", "preparing for FAANG system-design interviews", "CTO of 5-person team"] 2. My actual scale today: [e.g. "100 users", "5 rps peak", "100k DAU but Postgres is on fire"] 3. My concrete pain right now: [describe it, or say "nothing yet, I just want to be ready"] Tell me straight: - Will this course solve my actual pain, or am I reaching for architecture when the fix is a missing index / better query / smaller instance? Diagnose from the context. - If I only need to take 3 modules out of 10, which 3 match my situation and why? - What's a concrete anti-pattern I might be tempted to apply (microservices, Kafka, K8s) that would hurt a team at my scale? - How would I know, in 3 months, that the course actually paid off — what measurable change should I see in my system or my decisions?
Take a single-instance service and horizontally scale it to 10 instances behind a load balancer. Include session affinity removal, distributed config, idempotent writes, and a load test showing linear throughput scaling. Document the limits where additional scale stops helping.
Implement circuit breaker (Hystrix-style) and bulkhead isolation across a service that depends on 3 downstream services. Demonstrate behavior when one downstream goes down: requests to it fail fast, but requests to healthy ones still succeed. Include a chaos test.
Design an active-active deployment across 2 regions: data-replication strategy (conflict resolution, consistency model), latency-based routing, regional failover playbook, and a tested DR drill. Produce the design doc and prototype the routing + replication.
Free online. The networking chapters explain why scaling-at-the-edge works the way it does.