Open this lesson in your favourite AI. It'll walk you through the why, explain the demo, and quiz you on the try-it list.
Generative AI has fundamentally changed the speed of test case creation. A QA engineer who knows how to prompt effectively can generate 20 well-structured test cases from a user story in 2 minutes — a task that used to take 30 minutes. But AI-generated tests are only as good as the prompt: vague prompts produce generic 'happy path only' tests that miss edge cases and negative scenarios. The engineers who benefit most from AI test generation are those who can write precise prompts that encode their domain knowledge: user roles, business rules, data constraints, and known failure modes.
Prompt engineering for test generation works best when the prompt encodes domain knowledge that the model can't infer from the user story alone — user roles, business rules, known security failure modes, and boundary conditions. A well-structured prompt produces a complete test matrix in two minutes; a vague one produces a happy-path list that misses most of the value. Builders who can write precise test-generation prompts amplify their coverage velocity without inflating the time spent on initial test design.
# AI test generation prompt patterns — paste these into any LLM
USER_STORY = """
As a registered user, I want to reset my password via email
so that I can regain access to my account if I forget my credentials.
Acceptance criteria:
- User enters their registered email address
- System sends a password reset link (valid for 24 hours)
- User clicks the link and sets a new password (min 8 chars, 1 uppercase, 1 number)
- Old password no longer works after reset
- Reset link can only be used once
"""
# Pattern 1: Generate comprehensive test cases
def test_case_prompt(user_story: str) -> str:
return f"""You are a senior QA engineer. Given this user story:
{user_story}
Generate a comprehensive test case suite covering:
1. Happy path (successful scenarios)
2. Negative tests (invalid inputs, errors)
3. Edge cases and boundary values
4. Security tests (link reuse, expired link, other user's link)
5. Performance considerations (what happens under load)
For each test case, provide:
- Test Case ID (TC-001, TC-002, ...)
- Title (concise, descriptive)
- Preconditions
- Steps (numbered)
- Expected Result
- Test Type (functional/security/edge case)
Format as a Markdown table."""
# Pattern 2: Generate negative tests only
def negative_test_prompt(user_story: str) -> str:
return f"""You are a QA engineer specializing in security and edge cases.
For this feature: {user_story[:200]}
Generate 10 negative test cases that attempt to:
- Break the validation rules
- Exploit the reset flow for security vulnerabilities (IDOR, link reuse, timing attacks)
- Test error handling and recovery
For each: title, input, expected error response."""
# Pattern 3: Generate test data
def test_data_prompt(user_story: str) -> str:
return f"""For the password reset feature, generate test data covering:
- Valid emails: 5 examples (standard, with +alias, with subdomain, long)
- Invalid emails: 5 examples (no @, no domain, special chars)
- Valid passwords: 5 examples meeting all criteria
- Invalid passwords: 5 examples breaking each rule (too short, no uppercase, no number)
Format as a Python dict."""
print("Prompt 1 (full test suite):")
print(test_case_prompt(USER_STORY)[:300] + "...")
print("\nPrompt 2 (negative tests):")
print(negative_test_prompt(USER_STORY)[:300] + "...")python3 main.pytest_case_prompt through any LLM. Count how many test cases it generates. Are there any you wouldn't have thought of?test_data_prompt. Then ask it to generate the same data as a pytest @pytest.mark.parametrize decorator. This is a 2-minute task that previously took 15 minutes.Use these three in order. Each builds on the one before.
In one paragraph, explain what AI-assisted test generation is and what it can and can't replace in a QA engineer's workflow. What is a QA engineer with good prompting skills better at than a QA engineer without them?
Walk me through why the quality of an AI-generated test suite depends heavily on the quality of the prompt. Specifically: what information in a prompt produces edge cases and negative tests that a generic prompt misses (domain rules, user roles, data constraints, known failure modes)?
I want to build an AI-assisted QA workflow for my team: user stories come in → AI generates first-draft test cases → QA reviews and enriches → tests are implemented. Walk me through: the prompt template structure, the review checklist to catch what AI missed, the types of tests AI consistently gets wrong (security, performance, domain-specific rules), and how to measure whether this workflow saves time vs creates rework.