Blog /

Datavant Supports FDA Guidance with a Privacy-first Approach to Using Real World Evidence for Regulatory Decision Making

Publish Date
Read Time
Doug Fridsma
December 1, 2021

Last week Datavant submitted comments on the recently issued FDA’s guidance that helped to clarify how real world data could be made “fit for purpose” in regulatory decision-making.

In our response, we celebrated and applauded the FDA for advancing the use of real world data while highlighting a critical gap related to patient privacy. Our comments recommended strengthening the FDA guidance by taking a privacy-first approach to real world data, with the addition of privacy standards to the agency’s guidance to industry.

Privacy preserving linking technologies have matured rapidly in recent years, and provide solutions to linking data that were not possible even a few years ago. The FDA guidance on the use of real world data in regulatory decision-making — coupled with other parallel innovations on how the industry conducts clinical trials — will take the industry closer to a future state where it is possible to augment gold standard clinical trials with linked, longitudinal RWD, to support the ongoing safety and efficacy of drugs, devices, and diagnostics in the marketplace. Datavant believes in this vision, and we look forward to working closely with our partners in industry and at the FDA to ensure RWD continues to support scientific and therapeutic advances, while preserving patient privacy.

A special thanks to Claire Cravero, Vera Mujac and Elenee Argentinis from Datavant, Dan Riskin from Verantos and Brigham Hyde from Eversana for their early input and review.

Below is the full text of our comments. For questions, concerns, or comments, please reach out to Doug Fridsma ( or Elenee Argentinis (


Datavant appreciates the opportunity to comment on the FDA Draft Guidance for Real-World Data: Assessing Electronic Health Records and Medical Claims Data to Support Regulatory Decision-Making for Drug and Biological Processes.

Datavant works with over 500 institutions to connect health data in ways that preserve patient privacy. Our mission is to connect the world’s health data to improve patient outcomes. A network of healthcare and life sciences companies, non-profits, and government entities utilize our common infrastructure for the safe exchange and linking of patient-level health information. Many of our partners in the life sciences use our technology to leverage real world evidence in the entire clinical research life cycle. We improve the access, quality and value of health care data for our partners, while always protecting patient privacy.

The RWD guidance is an important step forward in providing clarity for the use of RWD to support RWE

We are pleased with the thoughtfulness and quality of the draft guidance statement released in September 2021. We agree that data source selection, curation, and analysis is critical to successfully using real world data in support of regulatory decision making, and appreciate the comprehensive guidance in how to evaluate and make sure real world data is fit for purpose in improving study design, efficiently managing a clinical trial, and making regulatory decisions.

We also agree that high quality data will lead to higher quality regulatory decision making, and that real world data can (and should) be used to support regulatory decisions directly, used to support primary evidence, and used to manage pragmatic trials or support novel comparator arms.

And finally, we agree that linking different kinds of data is essential to enhancing the quality and completeness of real world data. For example, linking EHR and claims data can add detail to a patient record that fills in missing information, or support better accuracy by providing a second data source that validates data accuracy. Linking EHR data across different organizations provides a more complete longitudinal record of a patient, as well as a better representation of the US population. We also believe that including laboratory data, patient generated data from devices and apps, and other data sources that incorporate the socioeconomic and demographic determinants of health, can capture data not found in traditional medical records. Linking data across different organizations, data types, and time frames is how we get to better data quality and completeness when using real world data.

In this guidance, the FDA has established a strong foundation for the use of real-world evidence in regulatory decision making. However, there remain some important considerations for the guidance going forward, and we have four specific recommendations to enhance the value of the RWE guidance.

FDA should make privacy of patient data a primary, overarching consideration for the use of real world data

We noticed there is very little mention of privacy in the draft guidance. Real world data that is drawn from EHRs, claims data, and from apps and devices, can compromise privacy if protecting a patient’s privacy is not considered at every step in the process. Although ensuring data privacy is important for all kinds of data used in regulatory decision making, our comments do not address situations in which a patient consents to the use of their identifiable data for clinical trial purposes. Instead, we are focused on situations in which data has been collected as part of the routine delivery of care in which patients have not consented that their data be used for a clinical trial.

As more health data is collected digitally and available for analysis, there is a risk that a patient’s privacy could be compromised. Much of the data available in EHRs or in claims data sets are subject to HIPAA provisions, and there are well understood methods to remove identifiable information and then use that data for research purposes without the explicit consent of the patient. However, to link different datasets together, traditional methods of linkage often require identifiable information to be shared (name, medical record number, date of birth) so that patient information from one dataset can be properly linked to another dataset. The process of de-identification removes much of the personally identifiable information (PII) making linkage across different datasets difficult without compromising the privacy of a patient.

In recent years, there has been renewed focus on privacy-preserving methods to link or analyze records across different datasets. These novel methods include complex encryption methods, differential privacy, secure multi-party computing, federated learning, and hashing and linkable redaction (Pseudonymization). These approaches vary in complexity, maturity and adoption, but each represents a way to de-identify patient information, while maintaining the ability to link patient data across different datasets.

We believe these methods are now sufficiently robust to serve the purposes of clinical trials research and regulatory decision-making, and provide significant benefit for patients and regulators. An emphasis on maintaining the privacy of a patient’s data drawn from real-world data should be a fundamental tenet of how real world data is used to support clinical trials research and regulatory decision making. These methods of de-identification can protect a patient’s privacy, while still allowing high quality analysis and evidence generation.

A second benefit of a privacy-first policy in the use of real word data is to also reduce the risk of re-identification. Re-identification can occur when two properly de-identified datasets are combined and the resulting dataset contains sufficient detail to allow an individual to positively identify a patient from the combined dataset. This is particularly important when the data being combined includes patients in pediatric, rare diseases, molecular sub-groups, and under-represented populations.

What is important to note is that privacy preserving methods for data linkage allows the FDA to take a privacy first approach to using real word data, while still being able to leverage real world data for study design, assessments of data quality, removing duplicated medical records across merged datasets, and supporting broad population analytics and post-marketing surveillance.

The FDA should require independent assurance and certification that real word data used in regulatory decision making has been properly de-identified

As described above, there is a real risk of re-identification when different datasets are combined, or when data from vulnerable populations is used in a clinical study. HIPAA provides for two methods to assure that the re-identification risk is low: One is safe harbor in which the real world data is stripped of 18 data elements that contain personally identifiable data (PII). A second method is expert determination that assures that there is a very small risk of re-identification in the combined or linked dataset. Statistical experts in this space include academic and private sector entities, who are highly trained to support privacy protection as set by the high bar of HIPAA. Both of these methods assure that once a dataset is linked and de-identified for the purposes of research, the risk of re-identification in the combined data set is low.

We believe it is not sufficient to simply use privacy preserving methods of data linkage. In a privacy-first approach to the use of real world data, important datasets can be independently reviewed to assure that they have been properly linked and de-identified with a minimal risk of re-identification. This provides independent verification that the proper procedures have been followed for de-identification, and the new, linked datasets maintain the privacy of the patients’ information.

Currently, the number of such certifiers is large and growing, and this additional requirement would assure that patient information and privacy is protected.

Privacy-preserving record linkage techniques can and should be used throughout the clinical trials life cycle to improve data quality.

In our experience, the challenge of sourcing the right data for the right patient can limit the ability to use RWD in clinical research. As well described in the draft guidance, data quality can limit the ability to use RWD for regulatory decision making. Issues include data fragmentation across different health care systems, duplication of patients across linked data sets, and missing data. All of these factors make it difficult to have RWD of sufficient quality for use in regulatory decision making.

Our ecosystem partners have used privacy-preserving techniques to improve the quality of RWD in several ways:

  • By aggregating data from different data sources to ensure patients are followed longitudinally even if they switch care setting
  • By de-duplicating records from different datasets or across different organizations
  • By linking different types of data to ensure data completeness and to ensure relevant data points are captured for a holistic view of the patient population

High quality matching and deduplication does not require using identifiable data, and techniques now exist that can improve study design, improve data quality and completeness, and to aid in post-marketing surveillance — all without requiring exposing patients’ identifiable information.

FDA should use the minimum necessary RWD data for study design, execution and monitoring, to limit risks to patient privacy

Finally, it is critical that the privacy concerns are considered in determining the fitness for purpose of any dataset. For example, studies to inform study design (such as trial feasibility studies, or the use of real world data to assess inclusion/exclusion criteria) or observational research to narrow the focus of a research question, may have a higher bar to protect privacy than randomized clinical trials in which patients have been consented to participate. In a privacy-first approach to using real-world data, the level of privacy protection should be matched to the task or decision for which it is being used. With privacy preserving linkage techniques, it is possible to protect patient privacy across a wide spectrum of different use cases, and make sure that patient privacy is preserved in trial design analysis, observational studies, or supplemental real-world data to support a clinical trial.

A privacy-first policy for real world data is possible now

We believe there has been significant progress in developing privacy-preserving methods to link real-world data in support of clinical trials and regulatory decision making. The FDA should leverage these advances to encourage a privacy-first approach to the use of real world data throughout the lifecycle of clinical trials research.

Achieve your boldest ambitions

Explore how Datavant can be your health data logistics partner.

Contact us