Tonight's Big Idea
Working code is not the same as a working system.
If you haven't tested your system as a user, your system is not ready.
1 Tonight's Schedule (60 min)
This is the condensed version of the full 3-hour W33D4 plan. The capstone build block continues outside of class — finish your testing work before W34.
0:00–0:08
Hook + Quick Concept (8 min)
Working code vs. working system. Three flavors of testing: unit, integration, end-to-end. Where each one catches a different class of bug.
0:08–0:25
Concrete Experience: Try to Break Your System (17 min)
Open your capstone. Use it as a user. Empty input, repeated queries, weird formats, ambiguous questions. Log everything that breaks in week33_testing/observations.md. Use the Break-It Lab as your prompt list.
0:25–0:32
Reflection: What Broke? (7 min)
One team at a time, 60 seconds each: where did your system fail, and was the failure in a component or in how the components talked to each other?
0:32–0:42
Mini Lab: Define 5 Test Cases (10 min)
Using the Test Case Builder, write down exactly 5: one normal query, one repeated query, one invalid input, one missing data case, one edge case. Save as week33_testing/test_cases.md.
0:42–0:55
Execute + Fix One (13 min)
Run your 5 test cases end-to-end. Pick the single most critical failure and fix it. Document the rest in edge_cases.md to handle after class.
0:55–1:00
Wrap + Handoff to W34 (5 min)
Key insight, what to finish before next session (performance testing), and a 30-second exit-ticket.
2 The Three Flavors of Testing
Each test type catches a different kind of bug. Tonight's focus is the middle and right columns — most capstones already have some of the left.
Unit
scope: one function
Does this function return the right thing given a specific input? Catches local logic bugs.
Integration
scope: components together
Does the API actually call the model? Does the model's output get parsed correctly downstream? Catches contract mismatches.
End-to-End
scope: full user journey
Can a user go from typing a question to getting a usable answer? Catches the things that only show up when the whole thing runs together.
Key distinction
If each component works individually but fails when combined, the problem is integration, not the components. That's the bug class most capstones still have.
3 Learning Objectives
Primary
- Test your system as a complete user workflow
- Identify failures in how components interact
- Define and execute end-to-end test cases
- Identify and document edge cases
- Improve system reliability before demo
Secondary
- Shift from component thinking to system thinking
- Understand how real users interact with systems
- Prioritize fixes based on user impact
4 Key Terms
Integration Testing — components working together
End-to-End Testing — full user journey
Edge Case — unusual or unexpected input
Happy Path — the ideal scenario
Test Case — a defined scenario for testing
System Reliability — consistent correctness under expected + unexpected conditions
5 Required Deliverables
Create a week33_testing/ folder in your capstone repo with:
Files to commit before W34
observations.md — what broke when you tried to break it
test_cases.md — your 5 defined test scenarios
e2e_tests.md — step-by-step user journeys
edge_cases.md — edge cases tested and outcomes
Success criteria
- Full system works from input to output
- Major failures identified and the most critical one addressed
- Edge cases documented (even if not all fixed yet)
- System behaves consistently across the 5 test cases
6 Pre-Class Resources
Skim if you didn't get to these before class — they map directly to the concepts we use tonight.
Instructor Framing
Users will not interact with your system perfectly. They will test its limits. Reliability matters more than feature count.