100 challenges — from your first MongoDB insert to reading WAL files, tuning autovacuum, and orchestrating a zero-downtime schema migration on a 100M-row table.
Databases are the layer where correctness, durability, and performance intersect — and most developers interact with them through abstractions that hide the interesting parts. This course removes those abstractions. You'll write queries from scratch in MongoDB and PostgreSQL, understand exactly what happens when a query hits the planner, build intuition for B-trees and LSM trees, set up replication and measure lag, implement distributed consensus with etcd, tune autovacuum, and write production runbooks. By the end, 'my database is slow' will never again be a mystery — you'll have the vocabulary, the tools, and the mental models to diagnose and fix it.
Built by Lakshya Kumar
We grant free access case-by-case — students, career-switchers, builders on a tight budget. Sign in to send us a note.
Sign in to applyComplete all modules, then submit the required number of capstone projects. Each must earn a passing rating from an admin reviewer.
Build a user-facing HTTP API that stores user profiles and account balances in PostgreSQL (with correct indexes on hot-path queries, SELECT FOR UPDATE on balance updates, and a zero-downtime column migration) and activity events in MongoDB (with an aggregation pipeline for activity summary queries). Run both databases with replication (single-node replica sets locally), route reads to replicas, implement read-your-writes using consistency tokens, and document your shard key design rationale for when write volume grows 100×. Deliver: a working service, a 1-page system design doc, and a load test showing the read path handles 200 req/s.
Paste this into any AI chat. Fill in the bracketed parts with your context — you'll get back a straight answer on whether this belongs on your plate.
I'm considering a 'Databases & Storage Internals' course — 100 challenges from MongoDB/PostgreSQL CRUD through indexing, replication, consensus, sharding, transactions, storage engines, and production operations. Ten modules, all four languages (Go, Python, Rust, Node.js) in parallel code tabs. Context about me: 1. My database experience today: [e.g. "I write SQL but don't understand indexes", "I've set up a MongoDB Atlas cluster but never touched replication", "I'm a backend dev who's never looked at EXPLAIN ANALYZE"] 2. My current database pain: [e.g. "slow queries I can't diagnose", "terrified of data loss", "not sure when to use MongoDB vs PostgreSQL"] 3. What I want to be able to do after this course: [e.g. "diagnose any slow query in under 10 minutes", "design the right schema for a new product feature", "know what to do when the database pages at 3am"] Answer: - Which module will give me the fastest return on my current pain, and why? - What's one thing I'll be able to do after this course that I genuinely cannot do today? - Is there any prerequisite I'm missing that would make modules 6-8 harder for me? - What will I NOT learn here that I might expect? (e.g. you will not learn Kubernetes, ORMs, or database administration for a specific cloud provider)
Build identical workloads on Postgres (B-tree) and RocksDB or LevelDB (LSM). Run write-heavy, read-heavy, and mixed workloads at 3 scale points. Produce a benchmark report with latency P50/P99, throughput, and storage footprint. Identify when each storage engine wins.
Implement a tiny key-value store with a write-ahead log: durable writes via fsync, checkpoint compaction, and crash recovery via WAL replay. Crash-test it with simulated kills mid-write and prove no committed write is lost.
Take 5 real (or representative) slow queries on a 100M-row Postgres or MySQL table. Use EXPLAIN ANALYZE to identify the cost model's choices, implement at least three index/query rewrites, and produce a before/after report with latency, rows scanned, and explanation of why the optimizer chose each plan.
Build a semantic search service on Postgres with pgvector: ingest 1M embeddings (OpenAI or local model), implement HNSW indexing, query latency P95 < 100ms, and a hybrid retrieval combining vector + keyword filters. Compare to a Pinecone or Qdrant baseline.
Primary reference for all PostgreSQL modules. Bookmark the monitoring, indexes, and MVCC sections.