For Life Science Companies

Data and Content Migrations: Minimizing the Risk
by David Katzoff, Managing Director of Product Development and Chief Architect for Valiance

Share This Article

Compliance and business risk plays a significant role in the implementation methodologies of corporate information systems. Further, the compliance and business risks associated with these corporate information systems are, in general, well known. However, as part of the implementation process many of these information systems will be populated with legacy data. The compliance and business risks associated with migrating this legacy data and content into a new system are not necessarily understood. In this context, risks associated with data migrations are a direct result of migration error. Further, industry testing strategies to mitigate such risk—or more specifically data migration error—lack consistency and are far from deterministic. This is the first of two articles that present thoughts and recommendations on how such a testing strategy can be designed.

The information presented in this article includes some of the lessons learned from our clients' quality control and the actual error history from testing the migrations of hundreds of thousands of fields and terabytes of content.

Valiance Partners has tested hundreds of data and content migrations, primarily in FDA-regulated industries (pharmaceuticals, medical devices, biotechnologies and food products) and in the auto and manufacturing industries as well. The information presented here includes some of the lessons learned from our clients' quality control experience and the actual error histories.

The recommended approach to designing migration testing strategies is to document the risk, the likelihood of occurrence and then to define the means to mitigate risk via testing as appropriate. The identification of risk is tricky and much of the process will be specific to the system that needs to be migrated. Having an understanding of the type of data being migrated and the characteristics of the destination system are good starting points. There is also the realization that most migrations will encounter unexpected types of error. Here are a few actual examples:

  • An off-the-shelf, vendor-supplied import tool erroneously converted non-English characters to English characters (e.g, converted any instance of "α" to "a"). In this case, the migration only had one such instance—it just happened to reside in a product name which was critical to this migration.

  • Migration of a large content management application (1+ M documents) included a single corrupt document. It was a part of a new drug application submission.

  • A consolidation of several systems included the need to update and standardize product names (product name table update). This was to be completed as one of many steps in a complex weekend process. The process was thoroughly tested and found acceptable. However, during the production run, this step was inadvertently omitted (it was late in the evening).

  • Initial production use of an application displayed blank fields where migrated data should have resided (the migrated data did exist but did not satisfy the new system's field-level validation rules for display).

If these risks or error conditions can be predetermined, designing the testing strategy can be straightforward. As migrations yield many examples of "needle-in-a-haystack" error conditions, designing a testing strategy can become a complex affair. In an attempt to create a comprehensive list of error conditions, Valiance began logging actual error conditions and categorizing these conditions. The list below presents these error categories and a few related error conditions:

  • Mapping/Specification Errors
    • Mappings are defined for fields that do not exist in the target system.
    • Mappings that populated source data into the incorrect target fields
    • Incorrect data type mappings
  • Migration Tool Errors
    • Incorrect configuration of source-to-target data mappings
    • Multi-values sources are configured such that only the first value is populated
    • Data transformation errors, i.e., the tool's calculations are not correct
  • Content Errors
    • Content corruption
    • Incorrect content placeholder is migrated
    • Missing content, e.g., missing PDF renditions
    • Mapping to incorrect locations, e.g., folders
  • Data Cleansing
    • Manual data-entry errors
    • Cleansed values are invalid
    • Source records inadvertently are excluded from cleansing activities (scoping)
  • Data-level Errors
    • Mappings that result in null values are mapped to mandatory fields
    • Data truncation
    • Invalid values migrated
  • Process Errors
    • Manual steps that were inadvertently excluded or executed incorrectly
    • Tool changes/customizations that were not completely tested
  • Functional Gaps
    • These are issues that arise due to a potential gap in expectations on the part of the legacy system users and the new system's functionality. While some might consider these issues to lie outside of the data migration, it is prudent to ensure that this class of error is appropriately managed.

A complete list of error categories and associate error conditions will be published at shortly.

Traditional Approach to Data and Content Migration Testing

If mitigating the risk involved with data migrations involves appropriate testing to minimize migration error, what are the options?

The de facto approach to testing data and content migrations relies upon sampling, where some subset of random data or content is selected and inspected to ensure the migration was completed "as designed." Those that have tested migrations using this approach are familiar with the typical iterative test, debug and retest method, where subsequent executions of the testing process reveal different error conditions as new samples are reviewed.

Sampling works, but is reliant upon an acceptable level of error and an assumption pertaining to repeatability. An acceptable level of error implies that less than 100 percent of the data will be migrated without error and the level of error is inversely proportionate to the number of samples tested (refer to sampling standards such as ANSI/ASQ Z1.4). As per the assumption on repeatability, the fact that many migrations require four, five or more iterations of testing with differing results implies that one of the key tenants of sampling is not upheld, i.e., "nonconformities occur randomly and with statistical independence..."

Dependent on specific requirements, sampling may have a role in a well defined migration testing strategy. But what are the alternative approaches to sampling that may be more appropriate for other testing scenarios?

Valiance's next article, scheduled for publication in January, will describe the various options for data migration testing and provide a set of recommendations to create a data migration testing strategy that minimizes the chance of error for a specific migration.

David Katzoff has, over the past 20 years, architected and developed technology that has been used in some of the largest document migration and consolidation projects undertaken in the life sciences arena. Today David is a highly valued resource in many of the largest life sciences companies in the world and is responsible for designing and implementing migration strategies in critical document management environments.

David Katzoff is managing director of product development and chief architect for Valiance. He brings more than twenty years of software applications engineering, technical training, project management and compliant business solutions design experience to this role. Much of it concentrated in enterprise-class content management for GxP data, content and processes.

David has spent much of the last seven years focusing on product and strategy for content management migrations. His efforts have been leveraged from some of the largest systems consolidation efforts completed to date as such clients as Amgen, Pfizer, Wyeth, Celgene, Bayer-Schering, Covidien, BMW and others.

David earned a Bachelor of Science in Electrical Engineering from the University of Rochester in Rochester, New York, and is a member of the Phi Beta Kappa Society.

Share This Article

Watch Related Videos

Download Free Resources
Product Data Sheet: Validation Services