Resources / Case Studies / TSB Bank Data Migration Failure: Lessons in Data Testing

TSB Bank Data Migration Failure: Lessons in Data Testing

Background

On July 8, 2015, TSB Bank was acquired by Banco Sabadell Group for £1.7 billion. One of the key projects involved migrating 5 million TSB customers to a new SABIS platform, Proteo4UK. However, data testing failures, along with other organizational and technical shortcomings, created a perfect storm of disruption for TSB customers and resulted in significant financial losses for the bank.

 

TSB Bank Data Migration Failure Case study Thumnail Image - iceDQ

Some Facts

Cost Category Party Description Source
£318,000,000 Direct Migration Cost TSB Money spent on the actual migration project itself. The Register
£247,000,000 Remediation Cost TSB Following the disastrous migration, TSB incurred additional costs for customer compensation, technical fixes, and increased staffing for customer service. The Mirror
£48,650,000 Regulatory Fines TSB Fines paid by TSB Bank to FCA and PRA. Reuters
£81,620 Regulatory Fines Former CIO The Prudential Regulation Authority (PRA) has fined the former Chief Information Officer (CIO) of TSB Bank plc (TSB). Bank of England
1,900,000 Customers unable to view their accounts Customers Customers didn’t have access to their accounts. Overseas ATM withdrawals declined. Digital banking was completely stopped. Many unauthorized transactions were suspected. BBC
1,300 Customers money stolen Customers Money stolen from their accounts – in some cases their life savings – by fraudsters exploiting the bank’s recent IT meltdown. The Guardian

The lessons from this costly mishap serve as a cautionary tale for organizations undertaking complex system migrations. They highlight the critical need for comprehensive data migration planning, rigorous data testing, robust quality control, and effective risk mitigation strategies to ensure a seamless transition. In this document, we share our perspective on the significance of data testing. When data testing fails, the consequences can be severe, as demonstrated by TSB’s experience-a lesson we can all learn from.

Multiple Point of Failures

The TSB Bank IT migration failure in 2018 was not solely due to data testing issues; independent audits and reviews by IBM, E&Y, and Slaughter and May identified multiple points of failure.

 

TSB Drawing

A. Unrealistic Deadlines: Predetermined timeline based on right-to-left approach.

B. Communication Breakdown: Lack of proper communication with 1,400 resources.

C. Project Complexity and Scope: New functionality was added during the ongoing data migration process.

D. Reliance on External Vendor: 70+ third party suppliers

E. Lack of Proper Data Testing: No reference was found for a dedicated data testing team or data test-automation platform.

  1. Big Bang Release: Big Bang Migration on Sunday, April 22, 2018 – Slaughter and May, 2.13, page 4.
  2. Migrating while Upgrading: Upgrading from Proteo3 to Proteo4 while Migrating Data – Slaughter and May, 2.14, page 4.
  3. Data Testing Ownership: Confusing ownership of data testing – Slaughter and May, 11.6, page 95.
  4.  Source Data Access: TSB did not own or control the source platform – Slaughter and May, 7.24, page 56.
  5. Lack of Full Volume Testing: Did not fully replicate the Production Environment -Slaughter and May, 11.7/C, page 96.
  6. Insufficient Time for Data Testing: Inability to complete testing of data migration… Key factor for delay – Slaughter and May, 11.41, page 104.
  7.  Partial Data Testing: Only read-only transaction tested, not updatable transactions – Slaughter and May, 18.30/A(iii) page 179.
  8. Incomplete Data Testing Cycle: Open defects should reduce towards the end of UAT… instead it grew – Slaughter and May, 17.8, page 166.
  9. Manual Data Testing: Automated data testing tool not listed – Slaughter and May, 11.6, page 95.

1. Big Bang Release

Big Bang Migration on Sunday 22nd April 2018 – Slaughter and May, 2.13, page 4

The project involved switching to a new system, upgrading the system and migrating millions of customers and their data, which had been collected over many years. The team decided to do small migrations, followed by a very large, risky migration.

TSB Drawing-02

Observations

  • Limited Cutovers: Initial cutovers were performed with small datasets, followed by a large and risky migration.
  • Missed Corner Cases: The use of limited datasets suppressed edge cases that could have been identified with larger datasets.
  • Post-Go-Live Issues: Many data issues were only discovered after the system went live, leading to avoidable complications.

Takeaways

  • Tranche-Based Migration: Migrate customers in phases, starting with simpler, low-volume cases, and progressively increasing complexity and volume.
  • Large Dataset Testing: Use substantial amounts of data during multiple trial runs to uncover hidden test cases
  • Post-Go-Live Issues: Testing with larger datasets before go-live helps identify potential data issues in advance.
  • Automated Data Testing: Invest in an automated data testing tool to enable thorough testing at scale, reducing reliance on manual efforts.

2. Migrating Data While Upgrading

Upgrading from Proteo3 to Proteo4 while Migrating Data – Slaughter and May, 2.14, page 4

The project highlighted the challenges of data testing in an incomplete and unstable system. A typical database captures events and persists data representing the application’s final state, including reference, master, and transaction data. However, when the system under development is incomplete, it leads to missing data scenarios, impacting the effectiveness of data testing. This results in critical test cases being overlooked, posing significant risks to the accuracy and reliability of the data.

TSB Drawing-03

Observations

  • Event and Data Persistence: Databases typically capture events and persist data during application operation, recording the final state of reference, master, and transaction data.
  • Incomplete Development Impact: If the system under development is incomplete, the data it persists in will miss certain scenarios.
  • Missed Test Cases: Incomplete data scenarios lead to critical test cases being overlooked during data testing.

Takeaways

  • Application First: Certify the software application before proceeding with data certification.
  • Avoid Testing Unstable Systems: Refrain from conducting data testing while the system is still under development and unstable
  • Allocate Extra Time: If unit data testing is performed during development, ensure additional time is allocated for comprehensive data testing once the application is fully developed.

3. Data Testing Ownership

Confusing ownership of data testing – Slaughter and May, 11.6, page 95

Data migration projects often face challenges due to the lack of focus and ownership in data testing. While other types of testing typically have designated teams and clear responsibilities, data testing is frequently overlooked or misunderstood, leading to critical gaps in quality assurance. Effective execution of the Software Development Life Cycle (SDLC) requires not only adherence to proper steps but also ensuring data testing receives the attention it deserves.

 

TSB Drawing-05

Observations

  • Shared Responsibility: The chart shows that data migration testing was a joint effort between two companies.
  • Neglected Data Testing: While other types of testing had designated teams, data testing was overlooked, misunderstood, or lacked the necessary expertise.
  • Importance of SDLC Order: Software Development Life Cycle (SDLC) involves numerous crucial steps. Each step must be meticulously executed and carried out in the correct sequence.

Takeaways

  • Prioritize Data Testing: Data testing is equally important, if not more so, than other project activities.
  • Designate a Data Testing Owner: Assign a specific individual to be responsible for overseeing all data testing activities.
  • Establish a Dedicated Data Testing Team: Create a specialized team solely focused on conducting thorough data testing.
  • Ensure Trained Resources: Ensure that all team members involved in data testing have the necessary training and expertise.

4. Source Data Access

TSB did not own or control the source platform – Slaughter and May, 7.24, page 56

The project involved building and validating a data pipeline to connect, transform, and load data across systems. Success depended on a clear understanding of the source data model and its dependencies. However, the development team lacked access to the source system, critical documentation, and subject matter experts (SMEs), leading to significant challenges and project failure.

 

TSB Drawing-06

Observations

  • Data Pipeline Workflow: A typical data pipeline connects to a data source, extracts data, applies transformations, and loads the results into a target database.
  • Dependency on Key Components: The integrity of the data process relies on the source system, transformation rules, and the target system.
  • Source Understanding: A clear understanding of the source data model and its contents is critical for both developing the data process and conducting effective data testing.
  • Access Challenges: In the TSB case, the development and implementation teams lacked access to the source system, leading to significant project challenges and eventual failure

Takeaways

  • System Access: The data testing team must have access to both the source and target systems.
  • Comprehensive Documentation: Ensure access to source system documentation, including the data model and reference data.
  • Engage Source SMEs: Even if the source system is being decommissioned, engage subject matter experts (SMEs) to ensure through understanding and support.

5. Lack of Full Volume Testing

Did not fully replicate the Production Environment – Slaughter and May, 11.7/C, page 96

The project involved testing a system without fully replicating the production environment, resulting in missed opportunities to validate critical use cases. While application testing requires testers to identify unique use cases manually, data testing inherently provides hidden use cases within the data itself. The failure to leverage the full production dataset during testing led to overlooked corner cases and a suboptimal system post-go-live.

 

TSB Drawing-07

Observations

  • Unique Use Cases in Application Testing: In application testing, testers must identify unique use cases to create corresponding test cases and test data.
  • Data-Driven Testing Advantage: In data testing, hidden use cases already exist within the data and can be leveraged by testers.
  • Missed Opportunity: The project team failed to fully replicate the production database for testing, missing the chance to test the system comprehensively.
  • Post-Go-Live Issues: As a result, corner cases were overlooked, leading to a suboptimal system when it went live.

Takeaways

  • Utilize All Data: Use 100% of the available data for testing.
  • Detect Corner Cases Automatically: Comprehensive data usage ensures automatic detection of corner cases.
  • Chaos Testing Advantage: Leveraging all available data functions like chaos testing, enabling + testers to uncover hidden patterns in the data and improve system functionality.

6. Insufficient Time for Data Testing

Inability to complete testing of data migration… Key factor for delay – Slaughter and May, 11.41, page 104

The project faced significant delays due to insufficient time allocated for data testing, a critical requirement for data-centric applications. While application testing was prioritized, the lack of proper planning for data testing resulted in incomplete testing and project failure. Effective project timelines must account for the additional time and resources needed to ensure comprehensive data validation.

 

TSB Drawing-08

Observations

  • Additional Data Testing Needs: Data-centric applications require more than regular testing; they have specific data testing requirements.
  • Time Allocation: Project teams must allocate additional time for data testing to ensure both application and data testing are completed on schedule.
  • Missed Planning: In this project, insufficient time was allocated for data testing, resulting in a failed project due to incomplete testing.

Takeaways

  • Prioritize Data Testing: Allocate sufficient time for data testing in the project timeline.
  • Avoid Afterthoughts: Data testing should be an integral part of the project plan, not an afterthought

7. Partial Data Testing

Only read-only transaction tested, not updatable transactions – Slaughter and May, 18.30/A(iii) page 179

The project’s data migration testing was incomplete, focusing only on read-only transactions while neglecting updatable ones. Comprehensive testing for data migration requires validating schemas, reconciling initial loads, and ensuring continuous data updates produce consistent results. The absence of continuous data reconciliation led to post-migration inconsistencies and highlighted the need for thorough planning and execution of all essential testing phases.

 

TSB Drawing-09

Observations

  • Essential Data Migration Tests: Data migration testing requires three distinct tests: schema comparison, initial load reconciliation, and continuous data reconciliation.
  • Schema Comparison: Validate that the new schema/data model includes all required entities, attributes, and data types.
  • Initial Load Reconciliation: Ensure the initial state of data in the legacy system matches the new system, establishing a consistent starting point.
  • Continuous Data Reconciliation: Verify that ongoing updates produce identical results in both systems.
  • Missed Planning: In this case, continuous data reconciliation was not planned, leading to inconsistencies post-migration.

Takeaways

  • Comprehensive Testing: Always conduct the three essential tests:
    – Schema Comparison
    – Initial Load Reconciliation
    – Continuous Data Reconciliation

8. Incomplete Data Testing

Open defects should reduce towards the end of UAT… instead it grew – Slaughter and May, 17.8, page 166

The project encountered a growing number of open defects during the User Acceptance Testing (UAT) phase, instead of the expected decline. This highlighted significant gaps in planning and execution, with critical issues left unresolved as the project neared completion. Starting data testing earlier in the project timeline and evaluating the impact of defects more rigorously could have mitigated these challenges and prevented cascading failures.

Incomplete Data Testing - iceDQ

Observations

  • Defect Trends: Typically, the total number of open and newly discovered defects should decrease as a project nears completion. However, this was not the case in this migration project.
  • Planning Issues: This indicates significant negligence or a lack of proper project planning
  • Escalating Defects: If defects show an upward trend, the project release date should be delayed until a thorough root cause analysis is conducted.

Takeaways

  • Shift Testing Left: Begin data testing efforts earlier in the project timeline to identify and address issues sooner.
  • Assess Impact: Evaluate the criticality of data issues based on their potential impact. Even small errors, such as incorrect reference data, can have catastrophic consequences on the system.

9. Manual Data Testing

Automated data testing tool not listed – Slaughter and May, 11.6, page 95

TSB Bank Case Study Manual Data Testing - iceDQ

The project suffered due to the absence of an automated data testing tool, leading to manual testing inefficiencies and restricted testing capabilities. Despite involving multiple staffing agencies, the lack of automation negatively impacted timelines, budgets, and testing quality. Automation is essential for comprehensive testing and efficient regression runs in data-intensive projects.

Observations

  • Lack of Automation Planning: The project clearly did not account for data test automation.
  • Productivity Challenges: While many staffing agencies were involved, the absence of automation tools hindered productivity.
  • Limited Testing Capabilities: The lack of an automated data testing tool significantly restricted the team’s ability to conduct thorough testing.
  • Project Impact: This oversight directly affected project timelines, budget, and testing quality.

Takeaways

  • Essential Automation: Given the data volume, manual testing is impractical; an automated data testing tool is essential.
  • Regression Testing Support: Automated tools are crucial for conducting multiple regression test runs efficiently
  • Automation Justification: Undertaking a multi-million-dollar project without test automation is “entirely unjustifiable”.

Conclusion

Data testing should be a priority in project planning, not an afterthought. It requires clear ownership, access to complete datasets, and tools to automate end-to-end data migration testing. The TSB case is a reminder that even small oversights can lead to big disruptions. By applying these lessons, organizations can better handle the challenges of data migration and ensure smoother transitions for their critical systems.

The TSB Bank migration failure shows how important robust data testing is in large-scale data migrations. Our analysis highlights nine key lessons, each drawn from the project’s shortcomings, to guide organizations in avoiding similar pitfalls. From ensuring full-volume testing and proper data ownership to prioritizing automation and allocating sufficient time for data testing, these lessons emphasize the need for proper planning and execution.

References

Sandesh Gawande - CTO iceDQ

Sandesh Gawande

CEO and Founder at iceDQ.
First to introduce automated data testing. Advocate for data reliability engineering.

Download the
Meta Analysis

  • *I agree to the privacy policy & cookie policy of iceDQ.

  • This field is for validation purposes and should be left unchanged.