Visual regression testing — catch unintended UI changes automatically

hard

Learn with your AI

Open this lesson in your favourite AI. It'll walk you through the why, explain the demo, and quiz you on the try-it list.

Open in Claude Open in ChatGPT

Why this matters

Functional tests verify that buttons work. Visual regression tests verify that the page looks correct. A CSS change that accidentally shifts a checkout button 10 pixels to the right, hides text under a nav bar, or changes the font size in a critical CTA might pass all functional tests but ship as a visual defect. Visual regression testing works by taking a screenshot of each page state, comparing it pixel-by-pixel to a baseline (the approved version), and failing when the diff exceeds a threshold. It catches the class of bugs that only a human eye would catch — but automated, so it runs on every commit.

Demo

Playwright's toHaveScreenshot() captures a pixel-accurate baseline on the first run and diffs subsequent runs against it, failing when the changed area exceeds a configurable threshold. Functional tests verify behavior; visual regression tests verify appearance — a CSS regression that shifts a checkout button off-screen will pass all functional assertions while shipping a broken UI to users. Running snapshot comparisons in CI catches an entire class of bugs that no amount of selector-based testing can reach.

import { test, expect } from '@playwright/test';

// Playwright's toHaveScreenshot() compares against stored baselines.
// First run: baselines are created. Subsequent runs: compared.
// Update baselines: npx playwright test --update-snapshots

test('homepage visual regression', async ({ page }) => {
  await page.goto('/');

  // Full-page screenshot comparison
  await expect(page).toHaveScreenshot('homepage.png', {
    maxDiffPixels: 50,           // allow up to 50 pixels of diff
    threshold: 0.2,              // 0-1 scale; 0.2 = 20% pixel brightness diff allowed
    fullPage: true,
  });
});

test('checkout button is visible and styled correctly', async ({ page }) => {
  await page.goto('/checkout');

  // Component-level screenshot (just one element)
  const checkoutBtn = page.getByRole('button', { name: 'Place Order' });
  await expect(checkoutBtn).toHaveScreenshot('checkout-btn.png');
});

test('error state visual — form validation', async ({ page }) => {
  await page.goto('/contact');
  await page.getByRole('button', { name: 'Submit' }).click();  // submit empty form

  // Capture the error state
  await expect(page.locator('form')).toHaveScreenshot('contact-form-errors.png', {
    maxDiffPixels: 0,  // zero tolerance for error state — must match exactly
  });
});

// CI tip: only run visual tests on specific environments to avoid false positives
// from font rendering differences between OS:
// npx playwright test --project=chromium  (consistent rendering)

Run: node main.js

Try it yourself

Run a visual regression test for the first time — the baseline screenshots will be created. Then make a small CSS change (e.g., change a button's border-radius from 4px to 8px). Re-run the test. Does it fail? Open the diff report.

Set maxDiffPixels: 0 and re-run on a page that has a timestamp, animation, or random element. Watch it fail. Then use page.evaluate(() => document.querySelector('.timestamp').remove()) before the screenshot to mask the dynamic content. Re-run — it should pass.

Research Percy.io and Applitools Eyes — two commercial visual regression services. How do they handle the 'different OS renders fonts differently' problem? What do they do that Playwright's built-in screenshot comparison doesn't?

Design a visual regression strategy for a design system with 50 components. Which components need their own visual snapshot? What page states (hover, focus, disabled, error) should be captured? How would you handle dark mode?

Prompt your AI

Use these three in order. Each builds on the one before.

1. Basics & terminology

In one paragraph, explain what visual regression testing is, what bug it catches that functional testing misses, and what the main false positive challenge is (why tests can fail without any actual bug).

2. Why it works (the mechanism)

Walk me through the baseline management workflow: how baselines are created on the first run, how they're stored (in the repo or in a cloud service), who approves baseline updates when a designer intentionally changes a component, and what the diff review process looks like.

3. Advanced — application & what's next

I'm adding visual regression to a large e-commerce site with 200+ pages and 3 themes (light, dark, high-contrast). The site has dynamic content (user names, dates, product prices). Walk me through: scope selection (which pages/states), masking strategy for dynamic content, baseline storage and approval workflow, false positive rate target, and CI execution time budget.