Engineering / Course

Scalable Systems: From One Server to Ten Thousand

A server that works for 10 users breaks at 10,000. Rate limiting, caching, load balancing, queues — one module at a time.

Free preview

Certificate: 1 of 5 capstones

A hundred challenges that take a working web server and turn it into something that survives real load. You'll write rate limiters, load balancers, caches, queues, and health checks — in Go, Python, Rust, or Node — and learn when each primitive is the right answer and when it's just extra complexity.

Built by Lakshya Kumar

engineering

systems

scalability

backend

infra

Before you start4 items

You have written and run a backend service — even a hobby one — that listens on a port.
Comfortable in one of Go, Python, Rust, or Node.js. Pick your favourite; every demo has tabs for all four.
You know roughly what HTTP, JSON, and a database connection are. You don't need to have tuned one.
Docker installed and working — several modules need Redis / Postgres running locally via `docker run`.

Is this course for you?Ask an AI

Get access to Scalable Systems: From One Server to Ten Thousand

$3.99

30-day access

Prefer the whole catalog? See all-access membership.

Ask for access

We grant free access case-by-case — students, career-switchers, builders on a tight budget. Sign in to send us a note.

Capstone projects

Submit any 1 of 5 to earn the certificate

Complete all modules, then submit the required number of capstone projects. Each must earn a passing rating from an admin reviewer.

capstoneScale one service to 10k rps

Take the Module 1 server and add: rate limiting, caching, a metrics endpoint, and structured logging. Load-test it at 10k rps on your laptop. Submit the code, a dashboard screenshot (or text table) of key metrics, and a writeup explaining every scaling decision.

Submit your scaled serviceMinimum rating for approval: 3/5

sharding-design-docSharding Design Document

Choose a real (or representative) database approaching scale limits. Write a sharding design doc: shard key analysis, rebalancing plan, cross-shard query handling, migration plan with zero-downtime, and a rollback strategy. Include capacity math for projected 10x growth.

Further reading & study material5 sources

Designing Data-Intensive Applications (Kleppmann)
book
Overlaps with this course's distributed-leaning chapters. The single most-returned-to book on most backend teams.
System Design Interview — Alex Xu (vol 1 & 2)
book
Good for naming and comparing primitives. Pair with the load balancer and cache modules.

Paste this into any AI chat. Fill in the bracketed parts with your context — you'll get back a straight answer on whether this belongs on your plate.

Prompt

I'm considering a "Scalable Systems" course. It starts from a working web server and adds, one module at a time: rate limiting, caching, load balancing, queueing, observability, database scaling, sharding, replication, degradation, and back-of-the-envelope capacity planning. Demos are in Go/Python/Rust/Node. Capstone: scale one service to 10k rps on a laptop.

Context:
1. Where I sit today: [e.g. "mid-level backend at 100-person startup", "solo dev shipping SaaS", "preparing for FAANG system-design interviews", "CTO of 5-person team"]
2. My actual scale today: [e.g. "100 users", "5 rps peak", "100k DAU but Postgres is on fire"]
3. My concrete pain right now: [describe it, or say "nothing yet, I just want to be ready"]

Tell me straight:
- Will this course solve my actual pain, or am I reaching for architecture when the fix is a missing index / better query / smaller instance? Diagnose from the context.
- If I only need to take 3 modules out of 10, which 3 match my situation and why?
- What's a concrete anti-pattern I might be tempted to apply (microservices, Kafka, K8s) that would hurt a team at my scale?
- How would I know, in 3 months, that the course actually paid off — what measurable change should I see in my system or my decisions?

SubmitMinimum rating for approval: 3/5

horizontal-scale-stateless-serviceHorizontal Scale a Stateless Service

Take a single-instance service and horizontally scale it to 10 instances behind a load balancer. Include session affinity removal, distributed config, idempotent writes, and a load test showing linear throughput scaling. Document the limits where additional scale stops helping.

SubmitMinimum rating for approval: 3/5

circuit-breaker-and-bulkheadCircuit Breaker and Bulkhead Pattern

Implement circuit breaker (Hystrix-style) and bulkhead isolation across a service that depends on 3 downstream services. Demonstrate behavior when one downstream goes down: requests to it fail fast, but requests to healthy ones still succeed. Include a chaos test.

SubmitMinimum rating for approval: 3/5

multi-region-active-activeMulti-Region Active-Active Design

Design an active-active deployment across 2 regions: data-replication strategy (conflict resolution, consistency model), latency-based routing, regional failover playbook, and a tested DR drill. Produce the design doc and prototype the routing + replication.

SubmitMinimum rating for approval: 3/5

High Performance Browser Networking (Grigorik)

book

Free online. The networking chapters explain why scaling-at-the-edge works the way it does.

Scalable Systems: From One Server to Ten Thousand

Your first server

Rate limiting — token bucket, leaky bucket, sliding window

Load balancing — round-robin, least-conn, consistent hashing

Caching — LRU, TTL, cache invalidation

Queues & async work — why sync dies

Database scaling — read replicas, partitioning, sharding

Horizontal scaling & service discovery

Observability — metrics, tracing, SLOs

Failure modes — retries, circuit breakers, backpressure

Capstone — scale one service to 10k rps