Overview
End-to-end tests are the most expensive tests to write, the most fragile to maintain, and the most valuable when they work. They verify what users actually experience — complete journeys through the application stack, from UI interaction to database persistence and back. But most E2E suites are written reactively: a critical bug ships, someone writes a test to prevent regression, and over time the suite becomes a patchwork of incident responses with no systematic coverage of the journeys that matter most. This playbook teaches you how to use CoreStory to systematically identify, prioritize, and generate E2E tests for your application’s critical user journeys. CoreStory acts as a Journey Oracle — it knows the user stories, acceptance criteria, and critical paths from the PRD, and it knows the application’s routing, API endpoints, and data flow from the codebase. The agent extracts journey specifications from CoreStory, discovers E2E test conventions from the local project, and generates tests that verify complete user flows against acceptance criteria. The primary deliverable is executable E2E test code that matches the project’s existing E2E framework — Playwright, Cypress, Selenium, or whatever the team uses. Each test traces back to a user story or acceptance criterion, making coverage auditable and gaps visible. How this relates to the Behavioral Test Coverage playbook: The Behavioral Test Coverage playbook generates unit-level and integration-level tests for individual business rules, validation logic, and state transitions. This playbook generates journey-level tests that verify complete user flows across the full stack. They’re complementary — behavioral tests catch logic bugs in specific rules; E2E tests catch integration failures, UI regressions, and broken flows that span multiple components.When to Use This Playbook
- Critical user journeys (signup, checkout, onboarding) have no automated E2E coverage
- You’re preparing for a major release and need confidence that key flows work end-to-end
- A UI or API overhaul requires regression tests for existing user journeys
- You’re onboarding to an unfamiliar application and want to understand and verify its primary flows
- The existing E2E suite is a patchwork of incident-driven tests with no systematic coverage
When to Skip This Playbook
- You need to test individual business rules or validation logic (use the Behavioral Test Coverage playbook)
- The application has no UI or user-facing API — there are no journeys to test
- No E2E test framework is configured in the project and you don’t want to set one up (this playbook generates tests for an existing framework, it doesn’t bootstrap one)
- The application is a CLI tool, library, or SDK — behavioral tests are more appropriate
Prerequisites
- Everything listed in the parent playbook prerequisites
- An E2E test framework configured in the project (Playwright, Cypress, Selenium, etc.)
- A running or deployable test environment that E2E tests can execute against
- (Recommended) Seed data or fixture strategy for the test environment
- (Recommended) CI/CD pipeline that can run E2E tests
How It Works
The Workflow Phases
| Phase | Name | Purpose | CoreStory Role |
|---|---|---|---|
| 1 | Setup & Scoping | Select project, create conversation, define journey scope | Setup |
| 2 | Journey Extraction | Extract critical user journeys, acceptance criteria, and happy/unhappy paths | Journey Oracle |
| 3 | E2E Convention Discovery | Understand the project’s E2E framework, selectors, fixtures, and patterns | Oracle + Navigator |
| 4 | Journey Prioritization | Rank journeys by business criticality and existing coverage | Navigator |
| 5 | Test Generation & Stabilization | Generate E2E tests, run them, address flakiness | — (local code + CoreStory validation) |
| 6 | Completion & Capture | Review coverage, commit tests, rename conversation | Knowledge capture |
HITL Gate
After Phase 4 (Journey Prioritization): Before generating E2E tests, a human should review the journey list and prioritization. E2E tests are expensive to maintain — generating tests for low-value journeys wastes ongoing maintenance effort. The human validates that the selected journeys are worth the investment.
Step-by-Step Walkthrough
Phase 1 — Setup & Scoping
Goal: Establish the E2E test generation session and define the scope. Step 1.1: Find the project.- “E2E Test Generation — User Onboarding Journey”
- “E2E Test Generation — Checkout & Payment Flows”
- “E2E Test Generation — Core User Journeys (Full Suite)”
| Scope | When to Use | Expected Output |
|---|---|---|
| Single journey | One critical flow (e.g., checkout) | 3–8 test scenarios |
| Journey cluster | Related flows (e.g., all auth journeys) | 8–15 test scenarios |
| Core journeys | All business-critical user flows | 15–30 test scenarios, across multiple sessions |
Phase 2 — Journey Extraction (Journey Oracle)
Goal: Extract the critical user journeys, their acceptance criteria, and the happy and unhappy paths for each. Step 2.1: Query for user stories and journeys.- Journey name, user persona, and goal
- Happy path steps with expected UI state at each step
- Unhappy paths with error states and recovery flows
- Preconditions and data requirements
- Acceptance criteria for each journey
Phase 3 — E2E Convention Discovery (Oracle + Navigator)
Goal: Understand the project’s E2E test patterns so generated tests match existing conventions. Step 3.1: Query for E2E framework and structure.- How tests launch and configure the browser/runner
- How authentication is handled in tests
- How test data is created and cleaned up
- How assertions verify page state (text content, element visibility, URL changes)
- How tests handle waits, timeouts, and async operations
- Whether tests run in parallel or serial
- E2E framework and runner configuration
- Selector strategy (data-testid, roles, CSS, etc.)
- Page object or abstraction patterns
- Fixture/seed data approach
- Authentication strategy for tests
- 2–3 reference E2E test files to use as templates
Phase 4 — Journey Prioritization (Navigator)
Goal: Rank extracted journeys by value and identify existing coverage. Step 4.1: Query for existing E2E coverage.- Revenue impact — Journeys that directly affect conversion, payment, or retention (checkout, signup, subscription management)
- User frequency — Journeys that every user performs regularly (login, core workflow, search)
- Failure severity — Journeys where failure means data loss, security exposure, or user lock-out
- Complexity — Journeys with many steps, conditional paths, or cross-service interactions (higher value because they’re harder to test manually)
HITL Gate: Present the prioritized journey list to the human. E2E tests carry ongoing maintenance cost — confirm the selected journeys are worth the investment.
Phase 5 — Test Generation & Stabilization
Goal: Generate E2E tests for each prioritized journey, run them, and stabilize against flakiness. Step 5.1: Generate the happy path test. For each journey, start with the happy path. Using the journey specification from Phase 2, the E2E conventions from Phase 3, and the reference test files as templates, write the test. Each test should:- Follow the project’s E2E file naming and organization conventions
- Use the project’s selector strategy (data-testid, roles, etc.)
- Use the project’s page object or abstraction patterns if they exist
- Include a descriptive test name that references the journey and acceptance criteria
- Set up required test data using the project’s fixture approach
- Clean up test data after execution (or use isolation patterns)
| Failure Type | Meaning | Action |
|---|---|---|
| Selector not found | Element locator is wrong or page structure has changed | Fix the selector — use the project’s selector strategy |
| Timeout | Page load, API call, or animation took longer than expected | Add appropriate waits — prefer waiting for specific conditions over fixed delays |
| State mismatch | Test data wasn’t set up correctly or prior test left dirty state | Fix the fixture/setup — ensure test isolation |
| Assertion failure | The journey doesn’t behave as the specification describes | Investigate: is the spec wrong or is the application wrong? Flag for human review |
| Flaky pass/fail | Test passes sometimes and fails other times | See the flakiness management section below |
- Validation errors display the correct messages
- The user can recover from error states (fix input and retry)
- Network failures are handled gracefully (error messages, retry options)
- Partial completion states are handled (back button, refresh, timeout)
- All new tests pass consistently (run at least 3 times to check for flakiness)
- No existing tests broke
- Suite execution time is acceptable
Flakiness Management
E2E tests are inherently more prone to flakiness than unit or integration tests. Address flakiness proactively: Prefer condition-based waits over fixed delays. Wait for a specific element to appear, an API call to complete, or a URL to change — not for a fixed number of milliseconds. Isolate test data. Each test should create its own data and not depend on state from other tests. If the framework supports parallel execution, tests must be fully independent. Handle animations and transitions. If the application uses animations, either disable them in the test environment or wait for animation completion before asserting. Retry on infrastructure flakiness, not on application bugs. Most E2E frameworks support test retries. Use retries to handle transient infrastructure issues (network blips, slow CI runners) — but if a test consistently fails on the same assertion, that’s a real bug, not flakiness. Test against a stable environment. E2E tests should run against a dedicated test environment with controlled data, not against a staging environment that other teams are actively deploying to.Phase 6 — Completion & Capture
Goal: Finalize generated tests, capture the session, and report coverage. Step 6.1: Review coverage against the journey inventory.Tips & Best Practices
Start with the highest-value, simplest journey. The first E2E test you generate should be the most business-critical flow with the fewest steps. This gives you maximum value with minimum stabilization effort, and establishes conventions for subsequent tests. Generate fewer, more comprehensive E2E tests. Unlike behavioral tests where you want broad coverage of individual rules, E2E tests should focus on complete journeys. Ten well-structured journey tests are more valuable than fifty shallow click-through tests. Use the pyramid principle. E2E tests sit at the top of the testing pyramid. They should verify journey-level behavior, not re-test business logic that’s already covered by behavioral tests. If you’ve run the Behavioral Test Coverage playbook, the E2E tests can focus on flow and integration rather than rule verification. Name tests after journeys, not pages.test_new_user_can_complete_signup_and_reach_dashboard is more meaningful than test_signup_page. Journey-oriented names make coverage gaps visible at a glance.
Keep test data minimal. Create only the data each test needs, and create it as close to the test as possible. Shared seed data across tests creates hidden dependencies and ordering requirements.
When to involve a domain expert:
- After Phase 2 (journey extraction) — to validate that the extracted journeys match real user behavior
- After Phase 4 (prioritization) — to confirm which journeys are worth the maintenance investment
- When E2E tests fail on assertion — to determine whether the application behavior or the specification is wrong
Troubleshooting
CoreStory returns journeys that don’t match the current application. The PRD may describe planned features that aren’t implemented yet, or features that were descoped. Cross-reference journey specifications against the actual application routes and UI components before generating tests. Generated tests fail on selectors. The selector strategy from Phase 3 doesn’t match reality, or the application uses dynamically generated selectors. Inspect the actual DOM in a browser and update selectors to match. Preferdata-testid attributes or accessibility roles over CSS class selectors, which are fragile.
Tests pass locally but fail in CI.
Common causes: different viewport sizes, missing fonts or assets, slower execution speed on CI runners (needs longer timeouts), different environment configuration, or tests depending on local seed data that doesn’t exist in CI.
Tests are too slow.
E2E tests are inherently slower than unit tests, but they shouldn’t take minutes each. Common optimizations: parallelize independent tests, reuse authenticated sessions across tests (if the framework supports it), minimize redundant navigation, and ensure the test environment isn’t resource-starved.
Too many journeys to cover.
This is normal for mature applications. Focus on journeys that affect revenue, security, or data integrity. Use the prioritization from Phase 4 and plan a multi-session campaign — one journey cluster per session.
Agent Implementation Guides
Claude Code
Skill File
Save as.claude/skills/generate-e2e-tests/SKILL.md:
GitHub Copilot
Append to.github/copilot-instructions.md:
Cursor
Save as.cursor/rules/generate-e2e-tests.mdc:
Factory.ai
Save as.factory/droids/generate-e2e-tests.md: