Debugging Slow APIs

100 challenges — from reading a curl waterfall to finding the index that makes your p99 drop from 4s to 40ms.

Free preview

A diagnostic course built around the question 'why is this API slow?' Every challenge gives you a clue, a measurement technique, or a broken system to fix. You'll build fluency in the vocabulary (TTFB, turnaround time, p99, error budget), the instruments (curl -w, Server-Timing headers, Chrome DevTools waterfall, EXPLAIN), and the systematic approach (network first, then DB, then code, then infrastructure). By Module 10 you will have a repeatable playbook for any slow endpoint.

Built by Lakshya Kumar

performance

api

debugging

observability

databases

engineering

Before you start3 items

You can write and run a script in at least one of: Go, Python, Rust, or Node.js. The course does not require knowing all four — pick one.
You have used curl before. You don't need to know its flags — Module 1 teaches those.
You have at least a rough idea of what a database, a web server, and an HTTP request are. No systems-programming background needed.

Is this course for you?Ask an AI

Paste this into any AI chat. Fill in the bracketed parts with your context — you'll get back a straight answer on whether this belongs on your plate.

Get access to Debugging Slow APIs

$3.99

30-day access

Prefer the whole catalog? See all-access membership.

Ask for access

We grant free access case-by-case — students, career-switchers, builders on a tight budget. Sign in to send us a note.

Capstone projects

Submit any 1 of 5 to earn the certificate

Complete all modules, then submit the required number of capstone projects. Each must earn a passing rating from an admin reviewer.

capstoneEnd-to-end API performance audit

Pick any API (your own project, an open-source backend, or a public API). Run a full performance audit: baseline the p50/p95/p99 with your benchmark script, identify the single biggest bottleneck using the techniques from the course, fix it, and prove the improvement with a before/after benchmark. Deliver a written report that names every tool used, every suspect eliminated, and the measured result.

Submit audit reportMinimum rating for approval: 3/5

n-plus-one-eliminationEliminate the N+1 Query Pattern

Further reading & study material4 sources

Prompt

I'm considering a "Debugging Slow APIs" course. It starts with measurement vocabulary (TTFB, p99, SLOs), works through the network layer, database bottlenecks, server-side profiling, caching, external dependencies, serverless cold starts, observability, and load testing, and finishes with a systematic debugging playbook.

Context about me:
1. My current role: [e.g. "backend dev", "full-stack engineer", "DevOps/SRE", "frontend dev who gets blamed for slow APIs"]
2. The slowest API problem I've personally dealt with: [e.g. "never debugged one", "a query that took 8s with no indexes", "a cold start on Lambda that ruined our checkout"]
3. What I'm hoping this changes: [e.g. "I can diagnose any slow API in under 30 minutes", "I stop guessing and start measuring", "I can write SLOs my team actually uses"]

Answer these:
- For my background, which module will give me the fastest ROI in the next month, and why?
- Name one concrete thing I'll be able to do after this course that I can't do today.
- Is there a faster path for someone who only cares about one layer (e.g. just DB, just network)?
- What will I NOT learn here that I might expect? (e.g. "you will not learn how to provision infrastructure", "you will not learn Kubernetes")

Find an API endpoint with N+1 queries (yours or from a sample app). Implement at least three solutions (DataLoader, prefetch joins, query batching) and benchmark each. Produce a report showing latency reduction at P50/P95/P99 and the cost trade-offs of each approach.

SubmitMinimum rating for approval: 3/5

tail-latency-tuningTail Latency Tuning

Take an API with P99 latency more than 5x its P50. Identify the cause via profiling (GC pauses, lock contention, slow downstream, large response). Implement at least three optimizations, measure the result, and prove the new P99 is within 2x of P50.

SubmitMinimum rating for approval: 3/5

cache-layer-designMulti-Tier Cache Layer Design

Design and implement a 3-tier cache (in-process, Redis, CDN) for a high-traffic endpoint. Include cache invalidation strategy, stampede protection (singleflight), and a measurement showing cache hit rate, origin load reduction, and stale-while-revalidate behavior.

SubmitMinimum rating for approval: 3/5

rate-limit-designProduction-Grade Rate Limit Design

Build a rate limiter for an API: token-bucket per user + global, distributed via Redis with Lua script for atomicity. Include burst handling, fair-share across tiers, an admin-overridable allowlist, and a 10k-rps load test demonstrating correct throttling without false positives.

SubmitMinimum rating for approval: 3/5

Free online. The network-layer module draws heavily from chapters 2, 4, and 9.

Debugging Slow APIs

Measuring What Matters

The Network Layer — DNS, TCP, TLS, and Distance

Database Bottlenecks — Indexes, N+1, and Connection Pools

Server-Side Slowness — CPU, I/O, and GC Pauses

Caching — Strategy, Stampedes, and Invalidation

External Dependencies — Timeouts, Retries, and Circuit Breakers

Serverless and Cold Starts

Observability — Tracing, Logs, and Dashboards

Load Testing — Finding Your Breaking Point

The Debugging Playbook — End-to-End Methodology