Improving Data and Content Migration Testing

For Life Science Companies

In our previous article we outlined the nature of risks faced by regulated companies undertaking data migrations. We also gave some ideas on how that risk needs to be addressed and mitigated, primarily through the implementation of an appropriate and effective testing strategy. This article aims to continue that discussion and elaborate on what makes an appropriate and effective testing strategy for your particular environment.

This article also introduces a broad set of data migration testing techniques beginning with pre-migration testing. These techniques are all well established and offer a number of benefits. After a review of these techniques, a set of "top ten" recommendations are provided for the design of an appropriate migration testing strategy.

Traditionally, migrations have been tested using some form of post-migration testing, often limited to sampling. While this certainly has a role for some migrations, it starts relatively late in the overall process, is labor intensive, and misses many data-level errors.

Traditional Migrations

Traditionally, migrations have been tested using some form of post-migration testing, often limited to sampling. While this certainly has a role for some migrations, it starts relatively late in the overall process, is labor intensive, and misses many data-level errors. These limitations come into play particularly in highly regulated companies where the required margins of error are not feasible.

Pre-Migration Testing

The concept of pre-migration testing is not often, if ever, covered during migration planning. There is not a strong awareness among migrations professionals regarding comprehensive pre-migration testing and the value it can add to a migration and particularly those migrations that are considered complex.

Pre-migration testing takes place prior to the actual migration of any data, including test migrations. If done properly, pre-migration testing will assist with:

  • Verifying scope of source systems and data sets with user community and IT. Verification should include the data that will be included as well as excluded and—if applicable—tied to the specific queries being used for the migration.
  • Defining the source to target high-level mappings for each category of data or content and verify that the desired type has been defined in the destination system.
  • Verifying destination system data requirements such as the field names, field type, mandatory fields, valid value lists and other field-level validation checks.
  • Using the source to destination mappings to test the source data against the requirements of the destination system. For example, if the destination system has a mandatory field, pre-migration testing helps to ensure that the appropriate source is not null, or if the destination system field has a list of valid values, test to ensure that the appropriate source fields contain these valid values.
  • Test the fields that uniquely link source and target records and ensure that there is a definitive mapping between the record sets
  • Test source and target system connections from the migration platform.

  • Test tool configuration against the migration specification which can often be completed via black box testing on a field-by-field basis. If clever, testing here can also be used to verify that a migration specification's mappings are complete and accurate.

Formal Design Review

Conduct a formal design review of the migration specification when the pre-migration testing is near complete, or during the earliest stages of the migration tool configuration. The specification should include a definition of the source systems, the source system's data sets and queries, the mappings between the source system fields and the destination system, number of source records, number of source systems records created per unit time (to be used to define the migration timing and downtime), identification of supplementary sources, data cleansing requirements, performance requirements, and testing requirements. The formal design review should include representatives from the appropriate user communities, IT and management. The outcome of a formal design review should include a list of open issues, the means to close each issue and approve the migration specification and a process to keep the specification in sync with the migration tool configuration (which seems to continuously change until the production migration).

Post-Migration Testing

Once a migration is done, additional end-to-end testing can be executed. Expect a significant sum of errors to be identified during the initial test runs although it will be minimized if sufficient pre-migration testing is well executed. Post-migration is typically performed in a test environment and includes:

  • Test the throughput of the migration process (number of records per unit time) - This testing will be used to verify that the planned downtime is sufficient. For planning purposes, consider the time to verify that the migration process was completed successfully.
  • Compare Migrated Records to Records Generated by the Destination System - Ensure that migrated records are complete and of the appropriate context.
  • Summary Verification - There are several techniques that provide summary information including record counts and checksums. Here, the number of records migrated is compiled from the destination system and then compared to the number of records migrated. This approach provides only summary information and if any issue exists, it rarely provides insight to an issue's root cause.

  • Compare Migrated Records to Sources - Tests should verify that fields' values are migrated as per the migration specification. In short, source values and the field level mappings are used to calculate the expected results at the destination. This testing can be completed using sampling if appropriate or, if the migration includes data that poses significant business or compliance risk, 100% of the migrated data can be verified using an automated testing tool.The advantages of the automated approach include the ability to identify errors that are less likely to occur (the proverbial needles in a haystack). Additionally, as an automated testing tool can be configured in parallel with the configuration of the migration tool, the ability to test 100% of the migrated data is available immediately following the first test migration. When compared to sampling approaches, it is easy to see that automated testing saves significant time and minimizes the typical iterative test, debug and retest found with sampling.
    Migrated content has special considerations. For those cases where content is being migrated without change, testing should verify the integrity of the content is maintained and the content is associated with the correct destination record. This can be completed using sampling or, as already described, automated tools can be used to verify 100% of the result.

User Acceptance Testing

Functional subtleties related to the co-mingling of migrated data and data created in the destination system may be difficult to identify early on in the migration process. User acceptance testing provides an opportunity for the user community to interact with legacy data in the destination system prior to production release, and most often, this is the first such opportunity for the users. Attention should be given to reporting, downstream feeds, and other system processes that rely on migrated data.

Production Migration

All of the testing completed prior to the production migration does not guarantee that the production process will be completed without error. Challenges seen at this point include procedural errors, and at times, production system configuration errors. If an automated testing tool has been used for post migration testing of data and content, executing another testing run is straightforward and recommended. If an automated approach had not been used, some level of sampling or summary verification is still recommended.

Recommendations for the Design of Migration Testing Strategies

In the context of data and content migrations, business and compliance risks are a direct result of migration error. A thorough testing strategy minimizes the likelihood of data and content migration errors. The list below provides a set of recommendations to define such a testing strategy for a specific system:

  1. Establish a comprehensive migration team, including representatives from the user community, IT and management. Verify the appropriate level of experience for each team member and train as required on data migration principles, the source and the destination system.
  2. Analyze business and compliance risks with the specific systems being migrated. These risks should become the basis for the data migration testing strategy.
  3. Create, formally review and manage a complete migration specification - While it's easy to state, very few migrations take this step.
  4. Verify the scope of the migration with the user community and IT. Understand that the scope of the migration may be refined over time as pre- and post-migration testing may reveal shortcomings of this initial scope.
  5. Identify (or predict) likely sources of migration error and define specific testing strategies to identify and remediate these errors. This gets easier with experience and the error categories and conditions listed here provide a good starting point.
  6. Use the field-level source to destination mappings to establish data requirements for the source system. Use these data requirements to complete pre-migration testing. If necessary, cleanse or supplement the source data as necessary.
  7. Complete an appropriate level of post-migration testing. For migrations where errors need to be minimized, 100 percent verification using an automated tool is recommended. Ensure that this automated testing tool is independent of the migration tool.
  8. If there is some concern about the costs, time commitment or the iterative nature of migration verification via sampling, look closely at the ROI of automated testing.
  9. Complete User Acceptance Testing with migrated data. This approach tends to identify application errors with data that has been migrated as designed.

  10. Test the production run. If an automated testing tool was chosen, it is likely that 100 percent of the migrated data can be tested here with minimal incremental cost or downtime. If a manual testing approach is being used, complete a summary verification.

If you are going to select the testing strategy that is right for you, it is important to understand that the various options that are available do extend beyond the traditional sampling approach. Evaluate the potential that pre-migration testing could play in your project, understand the efficiency and cost implications of sampling vs. automated testing, and don't underestimate the impact that an effective design review and user testing can have on a migration project. Doing so will not only increase the likelihood for your migration's success, it will also save time and money along the way.

David Katzoff has, over the past 20 years, architected and developed technology that has been used in some of the largest document migration and consolidation projects undertaken in the life sciences arena. Today David is a highly valued resource in many of the largest life sciences companies in the world and is responsible for designing and implementing migration strategies in critical document management environments.

David Katzoff is managing director of product development and chief architect for Valiance. He brings more than twenty years of software applications engineering, technical training, project management and compliant business solutions design experience to this role. Much of it concentrated in enterprise-class content management for GxP data, content and processes.

David has spent much of the last seven years focusing on product and strategy for content management migrations. His efforts have been leveraged from some of the largest systems consolidation efforts completed to date as such clients as Amgen, Pfizer, Wyeth, Celgene, Bayer-Schering, Covidien, BMW and others.

David earned a Bachelor of Science in Electrical Engineering from the University of Rochester in Rochester, New York, and is a member of the Phi Beta Kappa Society.