Engineering / Course

Databases & Storage Internals

100 challenges — from your first MongoDB insert to reading WAL files, tuning autovacuum, and orchestrating a zero-downtime schema migration on a 100M-row table.

Free preview

Certificate: 1 of 5 capstones

Databases are the layer where correctness, durability, and performance intersect — and most developers interact with them through abstractions that hide the interesting parts. This course removes those abstractions. You'll write queries from scratch in MongoDB and PostgreSQL, understand exactly what happens when a query hits the planner, build intuition for B-trees and LSM trees, set up replication and measure lag, implement distributed consensus with etcd, tune autovacuum, and write production runbooks. By the end, 'my database is slow' will never again be a mystery — you'll have the vocabulary, the tools, and the mental models to diagnose and fix it.

Built by Lakshya Kumar

databases

postgresql

mongodb

storage-engines

replication

engineering

Before you start4 items

Comfortable writing code in at least one of Go, Python, Rust, or Node.js — the course shows all four side-by-side.
Familiar with running commands in a terminal, installing software, and starting a local server.
No prior database internals knowledge required — Modules 1 and 2 start from connect/insert/find.
A working Docker installation recommended for spinning up local MongoDB and PostgreSQL instances.

Is this course for you?Ask an AI

Get access to Databases & Storage Internals

$3.99

30-day access

Prefer the whole catalog? See all-access membership.

Ask for access

We grant free access case-by-case — students, career-switchers, builders on a tight budget. Sign in to send us a note.

Capstone projects

Submit any 1 of 5 to earn the certificate

Complete all modules, then submit the required number of capstone projects. Each must earn a passing rating from an admin reviewer.

capstonePolyglot Data Service

Build a user-facing HTTP API that stores user profiles and account balances in PostgreSQL (with correct indexes on hot-path queries, SELECT FOR UPDATE on balance updates, and a zero-downtime column migration) and activity events in MongoDB (with an aggregation pipeline for activity summary queries). Run both databases with replication (single-node replica sets locally), route reads to replicas, implement read-your-writes using consistency tokens, and document your shard key design rationale for when write volume grows 100×. Deliver: a working service, a 1-page system design doc, and a load test showing the read path handles 200 req/s.

Submit service and design docMinimum rating for approval: 3/5

btree-vs-lsm-benchmarkB-Tree vs LSM Benchmark

Further reading & study material5 sources

Paste this into any AI chat. Fill in the bracketed parts with your context — you'll get back a straight answer on whether this belongs on your plate.

Prompt

I'm considering a 'Databases & Storage Internals' course — 100 challenges from MongoDB/PostgreSQL CRUD through indexing, replication, consensus, sharding, transactions, storage engines, and production operations. Ten modules, all four languages (Go, Python, Rust, Node.js) in parallel code tabs.

Context about me:
1. My database experience today: [e.g. "I write SQL but don't understand indexes", "I've set up a MongoDB Atlas cluster but never touched replication", "I'm a backend dev who's never looked at EXPLAIN ANALYZE"]
2. My current database pain: [e.g. "slow queries I can't diagnose", "terrified of data loss", "not sure when to use MongoDB vs PostgreSQL"]
3. What I want to be able to do after this course: [e.g. "diagnose any slow query in under 10 minutes", "design the right schema for a new product feature", "know what to do when the database pages at 3am"]

Answer:
- Which module will give me the fastest return on my current pain, and why?
- What's one thing I'll be able to do after this course that I genuinely cannot do today?
- Is there any prerequisite I'm missing that would make modules 6-8 harder for me?
- What will I NOT learn here that I might expect? (e.g. you will not learn Kubernetes, ORMs, or database administration for a specific cloud provider)

Build identical workloads on Postgres (B-tree) and RocksDB or LevelDB (LSM). Run write-heavy, read-heavy, and mixed workloads at 3 scale points. Produce a benchmark report with latency P50/P99, throughput, and storage footprint. Identify when each storage engine wins.

SubmitMinimum rating for approval: 3/5

wal-replay-recoveryWrite-Ahead Log Replay Recovery

Implement a tiny key-value store with a write-ahead log: durable writes via fsync, checkpoint compaction, and crash recovery via WAL replay. Crash-test it with simulated kills mid-write and prove no committed write is lost.

SubmitMinimum rating for approval: 3/5

query-plan-optimizer-deep-diveQuery Plan Optimizer Deep-Dive

Take 5 real (or representative) slow queries on a 100M-row Postgres or MySQL table. Use EXPLAIN ANALYZE to identify the cost model's choices, implement at least three index/query rewrites, and produce a before/after report with latency, rows scanned, and explanation of why the optimizer chose each plan.

SubmitMinimum rating for approval: 3/5

vector-search-pgvectorVector Search with pgvector

Build a semantic search service on Postgres with pgvector: ingest 1M embeddings (OpenAI or local model), implement HNSW indexing, query latency P95 < 100ms, and a hybrid retrieval combining vector + keyword filters. Compare to a Pinecone or Qdrant baseline.

SubmitMinimum rating for approval: 3/5

Primary reference for all PostgreSQL modules. Bookmark the monitoring, indexes, and MVCC sections.

Databases & Storage Internals

MongoDB: Queries from Zero to Advanced

PostgreSQL: Queries from Zero to Advanced

Indexing & Query Performance

MongoDB vs PostgreSQL: Choosing Your Storage

Replication & Read Scaling

Distributed Consensus & Leader Election

Sharding & Write Scaling

Transactions & Locking

LSM Trees, Write Amplification & Storage Engines

Database Observability & Production Operations