How to create an LLM from scratch

100 challenges, from a scalar gradient to a fine-tuned model streaming behind your own API.

Free preview

A rebuildable LLM course. Start with tensors and autograd (in Go, Python, and Rust), build a micrograd, train a bigram, author attention from first principles, stitch a GPT, train it on TinyShakespeare, scale it with mixed precision and FlashAttention, fine-tune with LoRA and DPO, quantize to int4, and ship behind a streaming HTTP API. Every module has runnable code and a module-level project; the capstone is a small but real LLM you trained, fine-tuned, and deployed.

Built by Lakshya Kumar

llm

machine-learning

pytorch

engineering

Before you start5 items

Comfortable reading and writing Python. One year of programming experience in any language is enough.
Can install and use `python`, `pip`, `torch`, and optionally `go`, `rustc`, and `node`. Some modules use Go / Rust / Node for the engineering-around-the-model content via code tabs.
A GPU helps from Module 7 onward (T4 / 3090 / 4090 / A100). Modules 1–6 run fine on a laptop CPU. The capstone works with cloud GPU rentals (RunPod, vast.ai, Lambda) for <$20.
High-school calculus (derivatives and the chain rule). If that's rusty, the 3Blue1Brown calculus series is a one-evening refresh; Module 1 assumes you've done it.
No prior ML required. We build from scalars to self-attention. If you've read Karpathy's Zero-to-Hero you'll move faster but it's not required.

Is this course for you?

Get access to How to create an LLM from scratch

$3.99

30-day access

Prefer the whole catalog? See all-access membership.

Ask for access

We grant free access case-by-case — students, career-switchers, builders on a tight budget. Sign in to send us a note.

Capstone projects

Submit any 1 of 5 to earn the certificate

Complete all modules, then submit the required number of capstone projects. Each must earn a passing rating from an admin reviewer.

capstoneBuild, train, fine-tune, and deploy a small GPT

End-to-end: BPE tokenizer on a dataset you picked, a 10–50M-param GPT trained to a measurable val loss, instruction fine-tune with LoRA, DPO preference tune, int4 quantize, deploy behind a streaming chat API with moderation and structured logs. Ship as a repo with README, measurements, sample outputs, and 5 concrete failure modes you discovered.

Submit capstoneMinimum rating for approval: 3/5

attention-implementation-from-scratchAttention From Scratch in PyTorch

Further reading & study material7 sources

Paste this into any AI chat. Fill in the bracketed parts with your context — you'll get back a straight answer on whether this belongs on your plate.

Prompt

I'm considering a "How to create an LLM from scratch" course. It builds up from tensors/autograd, to neural nets, to tokenization, to attention, to a full Transformer, to training GPT on TinyShakespeare, to training-at-scale tricks (FlashAttention, ZeRO, mixed precision), to SFT + LoRA + DPO fine-tuning, to int4 quantization and deployment behind a streaming chat API. 100 challenges total. Python throughout, with Go + Rust + Node on the engineering/deployment modules via code tabs.

Context about me:
1. My current role/focus: [e.g. "backend dev who's curious about ML", "data scientist who only ever calls model.fit", "undergrad who's already watched Karpathy's videos once"]
2. The deepest I've gone into ML so far: [e.g. "nothing", "sklearn + XGBoost", "trained a CNN in PyTorch once", "fine-tuned Llama with a HuggingFace trainer"]
3. What I'm hoping this course changes about me: [e.g. "I can read an LLM paper and implement it", "I can deploy my own fine-tuned model at work", "I can start my own AI startup"]

Answer these:
- For my background, which 2 modules will give me the highest leverage in the next 3 months, and why?
- Name a concrete artifact I'd build during the course that I could actually use on my resume or at work.
- Is 60 hours worth it for me, or should I do something shorter first (just Karpathy, a course on ML basics, etc.)? Give your honest pick.
- What should I explicitly NOT expect — e.g. "you will not train a 70B model", "you will not beat GPT-4 at anything", "you will not learn RAG"?

How to create an LLM from scratch

Foundations: tensors, gradients, autograd

Language as data: tokenization, bigrams, and the LM loss

Neural nets from the ground up

Embeddings & the language-modeling task

Attention

The Transformer block

Building GPT — train your own

Training at scale

Fine-tuning & alignment

Shipping an LLM