Blog /

A Study Showing Real World Use of “Real World Data”

Publish Date
Read Time
Travis May
January 14, 2019

Medical research is beginning to make use of “real world data,” the data generated from everyday visits to doctors’ offices by normal patients. A new paper in The Oncologist provides a powerful demonstration of how this data can be applied to real studies.


The gold standard of medical research has been “randomized controlled trials,” clinical trials that test a small group of patients with a particular treatment against a control group; this data is used both for drug approval and informing doctors’ treatment decisions. Increasingly, there has been interest in augmenting this approach with real-world data coming from visits by real-world patients.

Advocates of real-world data studies point to two major advantages to using RWD:

  1. More data. There is much more data in the real-world than there is in clinical trial contexts; millions of patients are treated each month generating tremendous amounts of data (as opposed to the small percentage of patients participating in clinical trials).
  2. More representative. Clinical trials are not necessarily representative; the internal validity of tightly-controlled clinical trials can be difficult to generalize to real-world patient populations. In short, therapies perform one way in clinical trials, and another in the wild.

While real-world data studies are promising in theory, in practice there are a variety of major challenges to using this data. The data is messy (there is an incomplete view of the patient’s journey) and the lack of randomization challenges the experimental validity of results.

The Oncologist recently published a great study by Sam Khozin, et al, where real-world data was used to better understand outcomes for lung cancer patients taking one of two PD-1 inhibitors that were recently approved by the FDA. The study is exciting for three reasons:

  1. Leverages real patient data to complement knowledge gained during pre-approval clinical trials. The primary finding in the study was that real-world patients had shorter overall survival than was reported in the clinical trial (they also found that there was no difference in outcomes based on age or their line of therapy). The point here is that traditional clinical trials, which remain the scientific gold standard, can be strengthened with real-world data.

    You can only pack so much into a clinical trial protocol. It’s infeasible for clinical trials to provide all relevant information (comorbidities; drug-drug interactions; outcome variation by age, race, gender, etc.) or definitive information (as here, overall survival outcomes in the real-world) relevant to inform treatment decisions. Retrospective real-world evidence studies like this one can step in to fill the gap.
  2. Demonstrates how knitting together real-world data enables valuable retrospective studies. The scientists are to be commended for their vision but the unsung heroes of this study are the data managers and engineers who helped aggregate the data sources necessary to execute the study. Sponsors and investigators are still being educated on the mechanics of running real-world evidence studies, and this work helps point the way to how these studies will be conducted in the future.

    The primary data source for the study was Flatiron Health’s electronic health record (EHR) database, which draws on EHR data collected during routine patient care in over 260 community cancer care clinics. However, the investigators also linked mortality data to better understand patient outcomes; an incomes database to better understand patients’ socioeconomic status; and several derived variables as a proxy for understanding the clinician’s experience. By aggregating these data sources, the investigators were able to develop a more holistic view of the patients in the study, and to better understand the different factors that might be impacting patient outcomes.
  3. Accurate and representative patient selection. Clinical trials are notoriously homogenous. Not all sectors of the population are equally likely to respond to patient recruitment efforts and actively enroll in trials. In a real-world study, however, data collection is passive and is derived from sources used in the ordinary course of care. As a result, one of the great promises of real-world data is that it will allow for more diverse and inclusive studies.

Here, patients were selected based on a review of unstructured data derived from EHR systems, which supported a more representative and a more accurate approach than relying on cohort selection based on disease codes.

This is pioneering work. Real-world data is messy, but much of that messiness is hopefully solvable by pioneering studies like these and data analysts across the industry. We predict the continued advent of RWD will have major impacts on the industry: within five years, we expect that EHR vendors, claims clearinghouses and other clinical data stores will have adapted to make their data usable (and linkable) for real-world evidence.

As a result of the shift beyond clinical trials, pharma companies and contract research organizations (CRO) will rapidly evolve to include deep expertise in both real-world study design and the underlying data management necessary to execute.

Within ten years, at any given point in time, thousands of these retrospective real-world studies will be running concurrently, enhancing our knowledge of different therapies and treatment protocols. Further studies are underway, and patients will be better off as researchers make better use of the explosion of information happening in the real world.

Editor’s note: This post has been updated on December 2022 for accuracy and comprehensiveness.


Connect to the Nation's Largest Health Data Ecosystem

Request a demo

Achieve your boldest ambitions

Explore how Datavant can be your health data logistics partner.

Contact us