Overview
You’ve modernized the component. It compiles. The tests pass. But does it actually do what the old code did? This is the question that derails modernization programs. The hard part was never understanding what the legacy code says. The hard part is proving the new code preserves what the legacy code does: every business rule, every edge case, every invariant that accumulated over years of production use. Phase 6 of the modernization workflow is where behavioral equivalence is established. The Business Rules Inventory from Phase 2 defines the behavioral contract — every rule the modernized system must preserve. This playbook provides a systematic methodology for verifying that contract: tracing each rule from the inventory to its implementation in the modernized code, identifying behavioral differences, testing edge cases, and producing a structured equivalence report that gives stakeholders the confidence to retire the legacy component. CoreStory serves as a Verifier throughout this phase — comparing behavioral semantics across legacy and modernized codebases. It also operates as an Oracle for understanding the intent behind legacy implementations: “Why does this code handle negative quantities differently from zero quantities? Is that a business rule or a bug?” This distinction matters because not every legacy behavior should be preserved — some behaviors are bugs, workarounds, or artifacts of obsolete requirements. The verification process must surface these distinctions for human judgment. Who this is for: Engineers executing modernization work packages, QA leads responsible for sign-off, and domain experts who can validate whether behavioral differences are intentional improvements or regressions. What you’ll get: A structured Behavioral Equivalence Report for each modernized component — rule-by-rule verification status, identified differences with analysis, missing rules, integration point verification, and a clear recommendation on whether the component is ready for the Eliminate phase.When to Use This Playbook
- You’ve completed the Transform phase for a work package and the modernized component is functionally complete
- You need to verify that the modernized code preserves all business rules identified in Phase 2
- You’re in the Coexist phase and need to validate behavioral equivalence before retiring the legacy component
- A domain expert or stakeholder requires a structured verification report before approving legacy decommission
- You’ve identified behavioral differences during testing and need to systematically categorize them as intentional improvements, acceptable deviations, or regressions
When to Skip This Playbook
- You haven’t completed Phase 2 (Business Rules Inventory) — there’s no behavioral contract to verify against. Go back to Business Rules Extraction
- The modernization strategy is Rehost/Relocate with no application-level changes — behavioral equivalence is trivially preserved
- You’re still in the Transform phase — finish the implementation first, then verify
- The component is being Retired (decommissioned) rather than modernized — no verification is needed for code that’s being removed
Prerequisites
- A completed Business Rules Inventory (Phase 2) — this is the behavioral contract. Without it, there’s nothing to verify against
- A completed Transform phase for the work package under verification — the modernized component must be functionally complete
- A CoreStory account with both the legacy codebase and the modernized codebase ingested (or the modernized code available in the same project)
- An AI coding agent with CoreStory MCP configured (see Supercharging AI Agents for setup)
- (Recommended) Access to a domain expert who can validate whether behavioral differences are intentional improvements, acceptable deviations, or regressions
- (Recommended) The legacy system running in a test environment for comparison testing where static analysis is insufficient
How It Works
CoreStory MCP Tools Used
| Tool | Step(s) | Purpose |
|---|---|---|
list_projects | 1 | Confirm the target project |
create_conversation | 1 | Start a dedicated verification thread |
send_message | 2, 3, 4, 5 | Query CoreStory for rule tracing, behavioral comparison, and edge case analysis |
list_conversations | 1 | Find the Business Rules Inventory conversation and prior phase threads |
get_conversation | 1 | Retrieve the Business Rules Inventory for verification |
get_project_prd | 1 | Retrieve PRD for business context behind rules |
rename_conversation | 5 | Mark completed thread with “RESOLVED” prefix |
The Behavioral Verification Workflow
Note: The steps below are internal to this playbook. They are sub-steps of Phase 6 in the six-phase modernization framework, not a separate numbering system.This playbook follows a five-step pattern:
- Setup — Load the Business Rules Inventory and the modernized component. Establish the verification scope and identify which rules apply to the component under test.
- Rule-by-Rule Verification — For each business rule in the inventory, trace it to its implementation in the modernized code and verify semantic equivalence with the legacy implementation.
- Edge Case & Invariant Testing — Identify boundary conditions, invariants, and edge cases that might behave differently between legacy and modernized implementations.
- Integration Point Verification — Verify that the modernized component interacts correctly with adjacent systems — both legacy components still in the Coexist phase and other modernized components.
- Equivalence Report — Produce the structured Behavioral Equivalence Report with rule-by-rule status, difference analysis, and a recommendation.
Verification Approaches
Behavioral verification uses a tiered strategy. CoreStory is most powerful in Tier 1 (it holds both codebases’ semantic understanding) and in generating the test cases and input sets for Tiers 2–4. The higher tiers require additional tooling and running environments, but CoreStory guides what to test and interprets the results.Tier 1: Static Verification (CoreStory-Assisted)
No running code required. CoreStory compares behavioral semantics across the legacy and modernized codebases. Rule tracing is the primary method. For each business rule in the inventory, CoreStory locates the implementation in both the legacy and modernized codebases and compares the behavioral semantics — not the syntax, but the actual logic: conditions, transformations, side effects, and outputs. Invariant checking verifies that system-wide constraints are preserved. These are often implicit in legacy code — never explicitly documented but enforced by the implementation. Examples: “account balances never go negative,” “order status transitions are one-directional,” “all timestamps are stored in UTC.” CoreStory can identify invariants in the legacy code and verify they’re maintained in the modernized version. Data flow comparison traces the path of data through both systems for a given operation. Where does data enter? How is it transformed? What side effects occur? Where does it exit? Divergences in the data flow are often the source of subtle behavioral differences. Edge case generation uses CoreStory’s understanding of the business rules to identify boundary conditions: null inputs, maximum values, concurrent access, timezone boundaries, leap years, currency rounding. These are the cases where modernized code most often diverges from legacy behavior.Tier 2: Dynamic Verification (Requires Running Code)
When static analysis can’t establish equivalence with sufficient confidence, dynamic testing provides empirical evidence. Characterization testing (Golden Master testing) — coined by Michael Feathers in Working Effectively with Legacy Code — captures the legacy system’s actual outputs for a comprehensive set of inputs. Those captured outputs become the “golden master” that the modernized system must match. This is the most practical approach for systems where business rules were never formally documented: the legacy system’s behavior is the specification. Tool support: ApprovalTests, custom test harnesses. Contract testing defines expected behavior at integration points using consumer-driven contracts. Particularly relevant for verifying API gateway / façade layer behavior during the Coexist phase. When the modernized service replaces the legacy component, every consumer’s contract must still be satisfied. Tool support: Pact, Spring Cloud Contract.Tier 3: Production-Grade Verification (Requires Production-Like Environment)
For high-risk components where production-like evidence is required before cutover. Shadow traffic testing (dark launching) routes copies of production requests to both the legacy and modernized systems simultaneously, compares responses, and flags discrepancies. Run until the discrepancy rate drops below an acceptable threshold (typically below 0.01% for financial or regulatory-sensitive services, below 0.1% for standard services). Reference: Microsoft Engineering Playbook’s guidance on shadow testing. Record-replay testing captures actual production traffic from the legacy system, replays it against the modernized system, and compares outputs. Particularly valuable for batch processing workflows where you can capture input files and compare output files line by line. AWS Transform uses this approach: automated “bit-by-bit matching” of legacy expected data versus modernized results.Tier 4: Data Migration Verification
If the modernization involves moving or transforming data stores, data integrity is a distinct verification concern from behavioral verification. Data reconciliation compares legacy and modernized data stores across multiple dimensions: row counts (do we have all the records?), checksums (is the data identical?), semantic validation (do derived values compute correctly?), and referential integrity (are all foreign key relationships preserved?). This should be run both immediately after migration and again after the system has been processing live data in the Coexist phase.When static analysis isn’t enough: If Tier 1 verification cannot establish equivalence for a rule — the logic is too complex, the edge cases too numerous, or the legacy implementation too opaque — escalate to Tier 2 (characterization testing) or Tier 3 (shadow traffic) for that specific rule. The Behavioral Equivalence Report should note which tier was used for each rule’s verification.
HITL Gate
After Step 5 (Equivalence Report): A domain expert or engineering lead validates the Behavioral Equivalence Report before the legacy component is retired. This is the final gate before the Eliminate phase — the human must confirm that all verified rules are equivalent, all differences are acceptable, and all missing rules are addressed.
Step-by-Step Walkthrough
Step 1: Setup
Start by loading the Business Rules Inventory and establishing the verification scope for the specific component under test. Confirm the project and locate the Business Rules Inventory:Step 2: Rule-by-Rule Verification
This is the core of the verification process. For each business rule in the scoped checklist, verify that the modernized implementation preserves the behavioral semantics. The following example illustrates what rule tracing produces for a single business rule — tracing the legacy and modernized implementations side by side with comparison indicators: Trace rules to modernized implementations:Step 3: Edge Case & Invariant Testing
Business rules are tested against their expected behavior. Edge cases test the boundaries where implementations often diverge. Identify boundary conditions:Step 4: Integration Point Verification
Verify that the modernized component interacts correctly with adjacent systems — especially important during the Coexist phase when some components are legacy and some are modernized. API contract verification:Step 5: Equivalence Report
The interactive dashboard below shows what the verification status looks like at a glance — hover over any square for rule details:Sample data shown above — replace with actual verification results for your component.Compile all verification findings into the structured Behavioral Equivalence Report. Generate the report:
Output Format: Behavioral Equivalence Report
Each verified component produces a report following this template:Prompting Patterns Reference
Verification Patterns
| Pattern | Example |
|---|---|
| Rule tracing | ”Where is [Rule ID] implemented in the modernized code? Is the implementation semantically equivalent to the legacy version at [file:line]?” |
| Behavioral diff | ”Compare how the legacy and modernized systems handle [specific scenario]. Are the outcomes identical?” |
| Intent inference | ”The legacy code at [file:line] does [behavior]. Is this an intentional business rule or an artifact of the implementation?” |
| Completeness check | ”Are there any business rules in the legacy [ComponentName] that are NOT in the Business Rules Inventory? Rules we might have missed?” |
Edge Case Patterns
| Pattern | Example |
|---|---|
| Boundary testing | ”What happens when [field] is null, empty, at maximum value, or at minimum value — in both legacy and modern?” |
| Concurrency | ”What happens when two users simultaneously trigger [operation]? Does the legacy system handle this differently than modern?” |
| Temporal boundaries | ”How do the legacy and modernized systems handle [operation] at timezone boundaries, DST transitions, and leap year dates?” |
| Error propagation | ”When [upstream dependency] fails, how do legacy and modern handle the error? Same retry logic, same fallback, same error response?” |
Integration Patterns
| Pattern | Example |
|---|---|
| Contract comparison | ”Compare the API contract of legacy [endpoint] with modernized [endpoint]. Any differences in request/response format?” |
| Data flow tracing | ”Trace the data flow for [operation] through both systems. Are there points where data is transformed differently?” |
| Consumer impact | ”Which systems consume output from [ComponentName]? Will any of them break with the modernized version’s behavior?” |
Characterization Test Patterns
| Pattern | Example |
|---|---|
| Golden master generation | ”What inputs should we capture to create a comprehensive golden master for [ComponentName]? Consider: representative cases, boundary values, error conditions, and high-frequency production scenarios.” |
| Input set design | ”Based on the business rules inventory, what is the minimum set of test inputs that exercises every rule in [ComponentName]?” |
| Output comparison | ”The golden master test for [scenario] shows a difference in [field]. Is this a behavioral change or a formatting/precision difference?” |
Production Verification Patterns
| Pattern | Example |
|---|---|
| Shadow traffic scope | ”Which API endpoints in [ComponentName] handle the highest-risk business logic? These should be prioritized for shadow traffic testing.” |
| Discrepancy analysis | ”The shadow traffic comparison flagged [N] discrepancies. Analyze the patterns: are these concentrated in specific scenarios, or distributed randomly?” |
| Record-replay design | ”What production traffic should we capture for record-replay testing of [ComponentName]? Consider: peak load scenarios, end-of-period processing, and cross-timezone operations.” |
Data Migration Patterns
| Pattern | Example |
|---|---|
| Reconciliation checklist | ”What data integrity checks should we run after migrating [ComponentName]‘s data? Consider: row counts, referential integrity, computed fields, and audit trail continuity.” |
| Semantic validation | ”Which derived or computed fields in [ComponentName]‘s data store need to be recalculated and validated after migration?” |
Best Practices
Verify against the inventory, not against the legacy code. The Business Rules Inventory from Phase 2 is the behavioral contract. Verify against that, not against every line of legacy code. Legacy code contains bugs, workarounds, dead code, and obsolete behavior — not all of it should be preserved. The inventory captures what should be preserved. Classify differences before fixing them. Not every behavioral difference is a bug. Some are intentional improvements (the modernized version handles an edge case better). Some are acceptable deviations (the behavior differs in a way that doesn’t affect outcomes). Only regressions need to be fixed. Classifying before fixing prevents wasted work on “fixing” improvements. Involve domain experts for ambiguous rules. When the legacy code does something unusual and neither the engineer nor CoreStory can determine whether it’s intentional, escalate to a domain expert. These ambiguous behaviors are often the most critical — they’re the institutional knowledge that exists only in the code and in people’s heads. Verify invariants, not just individual rules. Individual rules can each be correct while their interaction produces different system-level behavior. Invariant checking catches these emergent differences — the system-wide constraints that no single rule fully defines but the system as a whole must maintain. Don’t skip integration point verification. Even if every business rule is perfectly equivalent, the modernized component can still break consumers if the API contract, data format, or error handling differs. This is especially important during the Coexist phase when the façade is translating between legacy and modern. Test implicit behaviors. The most dangerous regressions are the ones that don’t map to any explicit business rule — default values, ordering guarantees, timing behavior, error message formats. These “invisible” behaviors often aren’t in the Business Rules Inventory because nobody thought to document them. Ask CoreStory to identify them. Verify incrementally, not all at once. For components with many business rules, verify in batches grouped by functional area. This makes the verification manageable, allows parallel verification by multiple engineers, and produces partial results that can inform ongoing Transform work on other components. Let verification upgrade confidence states. Each finding from earlier phases carries a confidence level (Verified, High-confidence, Hypothesized, or Contradicted — see Working with AI-Derived Findings). Phase 6 is where Hypothesized findings can be promoted: a rule that was Hypothesized during assessment but now passes Tier 1 static verification and Tier 2 characterization testing can be upgraded to Verified. Track these upgrades in the Behavioral Equivalence Report — they strengthen the evidence base for stakeholder sign-off. Tailor verification depth to behavior tags. If Phase 3 assigned behavior tags, use them to calibrate verification effort.PRESERVE behaviors need full equivalence proof — same inputs, same outputs, same side effects. MODERNIZE behaviors need equivalence plus evidence of improvement. CHANGE behaviors need stakeholder sign-off that the new behavior matches intent, not legacy behavior. RETIRE behaviors need confirmation that removal doesn’t break consumers. This prevents over-investing in behaviors that were intentionally changed while under-investing in behaviors that must be identical.
Agent Implementation Guides
Claude Code
Claude Code
Setup
- Configure the CoreStory MCP server in your Claude Code settings (see CoreStory MCP Server Setup Guide).
- Add the skill file:
.claude/skills/behavioral-verification/SKILL.md with the content from the skill file below.- Commit to version control:
Usage
Tips
- This skill focuses on Phase 6 of the broader modernization workflow. It expects Phase 2 (Business Rules Inventory) to be complete.
- The skill works best when both legacy and modernized code are in the same CoreStory project or when prior conversation threads contain the Business Rules Inventory.
- Keep the SKILL.md under 500 lines for reliable loading.
Skill File
Save as.claude/skills/behavioral-verification/SKILL.md:GitHub Copilot
GitHub Copilot
Add the following to (Optional) Add a reusable prompt file. Create
.github/copilot-instructions.md:.github/prompts/behavioral-verification.prompt.md:Cursor
Cursor
Create
.cursor/rules/behavioral-verification/RULE.md:Factory.ai
Factory.ai
Create
.factory/droids/behavioral-verification.md:Troubleshooting
The Business Rules Inventory is incomplete or missing rules. This is the most common verification failure. If the inventory doesn’t cover all the rules in the component, verification will have gaps. Ask CoreStory: “Are there any business rules in the legacy [ComponentName] that are NOT in the Business Rules Inventory?” If significant rules are missing, go back to Business Rules Extraction and update the inventory before continuing. Legacy behavior appears to be a bug — should the modernized version preserve it? This is a domain expert decision, not an engineering decision. Document the behavior, explain why you suspect it’s a bug, and escalate. Some “bugs” are actually undocumented requirements — customers may depend on the buggy behavior. The safe default is to preserve legacy behavior unless a domain expert explicitly approves the change. The modernized component uses a completely different architecture — direct rule tracing is impossible. When the modernized code restructures logic significantly (e.g., extracting a rules engine, using event sourcing), direct file-to-file comparison won’t work. Shift to behavioral diffing: compare outcomes for specific scenarios rather than code structure. Ask CoreStory: “Given input [scenario], what does the legacy system produce? What does the modernized system produce? Are they equivalent?” Too many rules to verify — the report would take weeks. Prioritize by risk. Verify critical rules (financial calculations, security, data integrity) first. Then important rules (core business logic, workflow). Minor rules (formatting, display) can be verified later or deferred to runtime testing. Ask CoreStory to help classify: “Which of these rules have the highest impact if they regress?” Static analysis is inconclusive for a rule. The legacy implementation is too complex, uses external state, or relies on runtime behavior that can’t be verified through code analysis alone. Escalate to Tier 2: generate a characterization test (golden master) for that specific rule. If the rule involves production-specific behavior (load-dependent, timing-dependent), escalate to Tier 3 (shadow traffic). Document which tier was used in the Behavioral Equivalence Report. Integration point verification reveals incompatibilities during Coexist. If the modernized component’s API contract differs from legacy, the façade layer must translate. This is expected during Coexist — the façade exists precisely for this purpose. Document the translation the façade performs, verify the façade produces legacy-compatible output, and plan for façade removal when all consumers are updated. Agent can’t access CoreStory tools. See the Supercharging AI Agents troubleshooting section for MCP connection issues. Verify the project has completed ingestion by callinglist_projects and checking the status.