Open this lesson in your favourite AI. It'll walk you through the why, explain the demo, and quiz you on the try-it list.
Prompting (or in-context learning) means giving a pretrained model instructions and examples in the prompt — no weight updates. Fine-tuning means updating the model's weights on your labeled data. Prompting is fast to iterate, cheap, and surprisingly powerful for well-defined tasks with clear instructions. Fine-tuning is slower and needs labeled data, but it produces lower latency, lower cost-per-query, and higher accuracy when the task distribution is far from the model's pretraining data. Getting this decision wrong means either paying 100× more per query than necessary, or spending weeks fine-tuning a model that a 5-line prompt would have matched.
Zero-shot classification uses natural language entailment — the BART-large-MNLI model has never seen your label names but can infer 'finance' from 'The Fed raised rates' by reasoning about textual entailment. Fine-tuned classifiers learn explicit decision boundaries from labeled data and run at a fraction of the cost per query. The right choice turns on labeled data availability, label stability, and the inference budget.
# pip install transformers datasets torch
from transformers import pipeline
# Approach 1: Zero-shot prompting (no training data needed)
zero_shot = pipeline("zero-shot-classification",
model="facebook/bart-large-mnli")
texts = [
"The stock market crashed after the Fed raised rates.",
"Manchester United won 3-0 against Arsenal.",
"Scientists discover new exoplanet in habitable zone.",
]
labels = ["finance", "sports", "science"]
print("Zero-shot classification:")
for text in texts:
result = zero_shot(text, candidate_labels=labels)
top = result["labels"][0]
score = result["scores"][0]
print(f" [{top:8s} {score:.2f}] {text[:50]}")
# Approach 2: Few-shot prompting (GPT-style, no weight updates)
prompt_template = """Classify the following text as finance, sports, or science.
Examples:
"Fed raises interest rates by 25 basis points." → finance
"LeBron James scores 40 points in playoff win." → sports
"CRISPR used to reverse genetic blindness in mice." → science
Text: "{text}"
Answer:"""
print("\nFew-shot prompt (send to any LLM API):")
for text in texts:
print(prompt_template.format(text=text[:60])[:200])
print("---")python3 main.py'politics' to the zero-shot classifier and try 'Congress passes new infrastructure bill.' Does it classify correctly? Zero-shot works because the MNLI model understands natural language entailment — not because it knows your label names.'Apple stock rose 5% after the product launch.' Does zero-shot correctly classify it as finance over science? Adjust the label names to see how label wording affects confidence scores.pipeline('sentiment-analysis'). Which is more accurate on domain-specific text (e.g., technical product reviews)?Use these three in order. Each builds on the one before.
In one paragraph, explain the difference between prompting and fine-tuning a language model. When does prompting fail — what kinds of tasks reliably require fine-tuning?
Walk me through what 'in-context learning' means mechanically: when you include 3 examples in the prompt (few-shot), what is the model doing with those examples? Is it updating its weights? How does it 'learn' from them?
I'm building a customer support classifier for 80 fine-grained intent categories (e.g., 'billing dispute', 'password reset', 'shipping delay'). I have 500 labeled examples per category. Compare: (1) few-shot prompting with GPT-4, (2) fine-tuning a BERT-base classifier, (3) fine-tuning a Mistral-7B with LoRA. For each: expected accuracy, cost to build, cost per query at 1M queries/day, and latency.