For Clinical Researchers:

At the Heart of Clinical Research: The Protocol
by Rebecca Daniels Kush

The protocol is core to every clinical research study; it is the plan. The protocol is used in designing the study, selecting investigative sites, developing the data collection tools, describing the study procedures and the analysis plan. Institutional Review Boards (IRBs) or Ethics Committees use the protocol as the basis for approving whether a study can be initiated. A well-constructed protocol can ensure common understanding of the study objectives and procedures to be implemented, thereby improving quality and saving time and effort for those using it. Clearly, it is one of the most important documents used in clinical research.

...a review of multiple protocols (even across companies and therapeutic areas or different types of studies) demonstrates that there are certain sections and informational content found in all protocols.

However, the development of a protocol can consume significant company resources and time, particularly when the review group is large or the review process is complex. Leveraging technology can streamline aspects of this process and/or be used to evaluate the integrity within a protocol before it is finalized. However, to develop such an application requires that at least certain portions of the protocol to be 'machine-readable' as well as 'human-readable' and implies at least some commonality across all protocols.

The value of being able to leverage technology in the clinical protocol arena and to be able to reuse sections of the protocol for such purposes as populating study tracking databases or clinical trial registries, study results and reporting, regulatory submissions cooperative academic research and other related documentation has been recognized by medical communication experts and many others in the research team. This prompted the initiation of a project within healthcare and clinical research standards development and research organizations to develop a protocol representation standard. Since 2002, a collaborative, multidisciplinary project group has made significant inroads towards this end.

The Goal: A Protocol Representation Standard
The majority of protocols are unique. This is the nature of clinical research. This may give the perception that standardizing protocols is impossible or that this would compromise the science behind the research. However, a review of multiple protocols (even across companies and therapeutic areas or different types of studies) demonstrates that there are certain sections and informational content found in all protocols. 'Standardizing' the protocol refers to identifying these common elements across protocols, defining them carefully and creating a list or a 'library' that provides for reuse from the protocol other documents such as final clinical study reports, publications, the next protocol in the development program or databases or systems such as those for trial registry. It does NOT imply that the studies are all alike nor that the trial designs are the same. In addition, there is no desire to restrict the trial design or the science or the creativity involved; the intent is to provide a standard way to represent protocols so that a machine can identify certain common elements.

Primary concepts that were determined to form the basis for achieving a standard protocol representation include the following:

  • There is a base set of common elements across protocols and these can be clearly identified and defined such that they can form a 'data layer' that can populate a database.
  • The protocol represents the plan for a trial (not the actual results, although the results can also populate the same or another database).
  • The protocol is not only the common Word or pdf file that can be printed out in a paper format, but also an analogous computerized representation that is 'machine-readable'.

The Protocol Representation Group
The protocol representation standard was initiated as a project by leaders from CDISC and FDA within the Health Level Seven (HL7) Regulated Clinical Research Information Management (RCRIM) Technical Committee. Domain experts, including medical communication specialists, statisticians and project managers, were recruited from CDISC member companies to augment the initial group and to provide domain expertise including direct experience with protocol development for regulated clinical research. (Note that 'protocol' in the healthcare setting often refers to a treatment plan, which is substantially different from a clinical study/research protocol.)

The resulting group, the Protocol Representation Group (PR Group), is currently both an HL7 RCRIM project team and a CDISC team. It now includes representatives from the National Cancer Institute and 'observers' from FDA and EMEA, in addition to representatives of HL7 and CDISC. It is a multidisciplinary, representing the major types of stakeholders in clinical trials. The initial scope statement of the group was to "identify standard elements of a clinical trial protocol that can be further elucidated and codified to facilitate study design, regulatory compliance, project management, trial conduct and data interchange among consumers and systems." However, this scope statement has now been expanded: "to develop a standard structured protocol representation that supports the entire lifecycle of clinical research protocols to achieve semantic interoperability (the exchange of content and meaning) amongst systems and stakeholders."

Progress and Next Steps
After much exploration and debate, the standardization of protocol representation was finally tackled by developing a set of decisions on the approach and a set of assumptions on what the resulting model should be.

The decisions:

  • Development of the model/standard should concentrate on content first and implementation second.
  • Elements must be defined in a glossary, since the industry uses multiple definitions for the majority of protocol elements.
  • Identify a core set of elements initially, and expand with further detail, as needed.
  • The initial set of elements will be based on ICH E6 and ICH E3 documents, which focus on efficacy and safety trials, but can be applied to other types of studies.

Assumptions included (as examples):

  • Structure and content of the model/standard should be intuitive and clearly understandable to industry stakeholders familiar with clinical trial data and should have straightforward and easy to follow rules.
  • The model/standard should be sufficiently flexible that it could be applied to any clinical study.
  • The model/standard should allow some degree of flexibility in the way that some information may be represented to support differing preferences within the industry.
  • The model/standard should not be limited to any one specific implementation and so risk rejection by industry stakeholders.

A spreadsheet of elements was created, with the section headers reflecting those from the ICH E6, under which were added sub-sections and then elements. Each element was further elucidated with a glossary definition, source of the element (e.g. ICH, EudraCT) suggested codelists and attributes, cardinality, use case application and other relevant information. This spreadsheet was used to develop an extensive glossary for the protocol and also for an initial modeling attempt to develop an HL7 Clinical Document Architecture (CDA) for a Structured Clinical Trial Protocol (SCTP). This modeling was an education for the Protocol Group, which generally had experience in protocol development but not in the development of HL7 models or messages. Through this effort, the PR Group learned:

  • It was not yet clear at that point how the protocol should best be modeled for a standard (CDAs were typically for smaller documents, the RIM was not well-understood by domain experts, and a new version of the CDA - perhaps more appropriate for protocols - was in development).
  • The protocol group must initially focus on the protocol as the plan only, not the eventual instantiations and permutations (e.g. amendments).
  • There must be a prioritization of use cases.

The BRIDG Model
Concurrent with the initial model development efforts described above, the CDISC organization had initiated the development of an overarching model that would represent the clinical research domain. Following the HL7 Development Framework methodology, which uses unified modeling language (UML), and with the assistance of an HL7 expert, the vision behind this domain analysis model was to harmonize clinical research (i.e. CDISC) standards among each other and to harmonize the clinical research standards with those of healthcare. The HL7 Development Framework provides a means of eliciting domain expertise from those who may not fully comprehend the HL7 Reference Information Model and representing it in a UML diagram using verbiage that the domain experts do comprehend. The National Cancer Institute (NCI) became instrumental in collaborating with CDISC to progress this vision.

The clinical research domain analysis model was eventually named the Biomedical Research Integrated Domain Group (BRIDG) model because it will not only bridge various standards within the clinical research domain, but it is already bridging organizations. It has now been adopted as the HL7 RCRIM domain analysis model and has been a truly collaborative project among CDISC, NCI, HL7, FDA and others. It is an open model that is now being used in numerous implementation projects by CDISC, NCI, FDA and HL7 RCRIM. (See

At the very first BRIDG modeling session, CDISC participants (Directors from the Board) realized that the protocol is at the very central point of clinical research and that is where this domain analysis modeling began. Since the Protocol Representation Group was in a quandary as to the best way to model the protocol, it was decided that the group would focus their efforts on furthering progress for the BRIDG model. Their initial activity was to represent each element from the common elements spreadsheet they had developed in the BRIDG model. However, they also realized that this modeling forced them to capture the relationships among the different elements. These relationships are important to articulate and to build into the model when developing a machine-readable version of the protocol.

Priority Use Cases and Implementations
With the domain analysis modeling efforts (BRIDG harmonization) under way through CDISC, NCI, HL7 and FDA, the PR Group prioritized a set use cases and focused on the protocol as the plan. Of nearly a dozen use cases, the top three priorities are:

  1. Represent the CDISC Study Data Tabulation Model (SDTM), including Inclusion/Exclusion (Eligibility) Criteria, Study Design (Trial Design Model/Planned Assessments and Interventions) and Statistical Analysis Plan.
  2. Rationale: The SDTM is a published CDISC standard that is in data specifications in FDA Guidance for eSubmissions. If one is going to submit data/information to regulatory authorities using SDTM, a standard protocol should reflect this information in an analogous fashion. The FDA and NCI are implementing a cross-trial data warehouse that will include planned vs. actual data; the protocol clearly represents the planned information.
  3. Develop a standard for study tracking/summary/registration sections of a clinical study.
  4. Rationale: A standard for study tracking/registration has encouraged a harmonization opportunity for many separate activities and organizations that desire/require reporting of similar content: WHO International Clinical Trial Registry Platform,; EMEA EudraCT, SDTM Study Summary and others. This content represents typical elements contained within a protocol such that population of tracking databases could be automated.
  5. Develop a common representation for the machine-readable protocol document. Rationale: This is the ultimate goal of the PR group effort - to have an automated way to use information within the protocol in databases to reduce re-entry and rework while improving quality.
  6. A structured protocol will not be limited to these three use cases; the opportunities are numerous when considering opportunities for streamlining a clinical study with an protocol standard available.

    Current Status of the Protocol Representation Standard
    The Protocol Representation group successfully described the basic protocol elements in a spreadsheet, which was used for the next very important step - to represent the elements, along with attributes and appropriate relationships, into the BRIDG model. Although the BRIDG model began with the protocol at the focal point, increasingly more depth has now been harmonized into the BRIDG based upon the protocol details. An implementable version of the PR standard is being developed by extending the CDISC Operational Data Model (ODM), an XML schema that now serves as the CDISC standard for transporting CRF data and related information; the extended ODM will become Version 1.0 of the Protocol Representation standard. (This should be completed before the end of 2008.)

    For the BRIDG harmonization and ODM schema development, the Protocol Representation elements were grouped into sets as follows:

    • a) Clinical Trial Registry (CTR) - this includes the information/elements for trial registration (e.g. for EudraCT, or the WHO International Clinical Trial Registry Platform, protocol/study tracking for management purposes, and study summary information from SDTM
    • b) Eligibility Criteria - the basic criteria for inclusion and exclusion of a subject in a study
    • c) Trial Design Model, Part I (TDM, Part I) - the basic design of the study, including arms and visits
    • d) Trial Design Model, Part II (TDM, Part II) - the planned assessments and interventions for a study (including study calendar)
    • e) Statistical Analysis Plan (SAP) - the elements from a SAP that are found in the protocol
    • f) Other Protocol Template Sections - any other parts of a protocol that are not included in the first five groups

    Protocol Representation Standard Version 1.0 will cover the first four groups above and will result in an XML implementation based upon the relevant portions of the BRIDG model.

    Looking Forward
    Once Protocol Representation Version 1.0 is available, the rest of the protocol (SAP and the additional sections and attachments) will be modeled and incorporated into the BRIDG model. Other steps that may occur going forward are: a) development of an HL7 V3 message implementation using the same content as that used for the ODM implementation; b) further development by vendors of protocol authoring tools; c) development of a means to automate the population of trial registries/tracking databases and regulatory databases; d) the linking of the protocol standard with clinical study results reporting; and e) development of a standard means to automate the generation of case report forms (e.g. CDASH-based eCRFs) from the PR Standard.

    The Protocol Representation standard, as for the BRIDG model, applies to any protocol-driven research. If implemented broadly and in concert with the other CDISC standards, it is expected to have a significant impact in reducing study start-up time and resources and improving protocol quality and integrity.

    Rebecca Daniels Kush Rebecca Daniels Kush, Ph.D. is a Founder and the current President and CEO of the Clinical Data Interchange Standards Consortium (CDISC), a non-profit organization with a mission to develop and support global, platform-independent data standards that enable information system interoperability to improve medical research and related areas of healthcare.

    Read more:

    White Paper:
    Automating Document Control Processes
    White Paper:
    FDA Guidance for Clinical Investigators, Sponsors and IRBs Regarding Adverse Event Reporting in GCP Regulated Environments
    White Paper:
    21 CFR Part 11 Industry Overview: Ready for an FDA Inspection?

    Click here to view all available resources.