Real world data /

Real-World Data: What Is It and Why Does It Matter?

Real-world data is critical to clinical research, drug development, and patient insights for better outcomes. Learn about the use of RWD in healthcare.

Read Time

Introduction to Real-World Data

Real-world data comes from numerous sources and its value in healthcare research cannot be overstated. These data have valuable applications in proving efficacy, enhancing clinical trial design, and improving lifecycle management by reducing the resources required for post-market surveillance studies.

Real-world data (RWD) and real-world evidence (RWE) have the potential to influence the prevention, diagnosis, and treatment of various conditions. As the future of real-world data expands and leads healthcare to a new frontier, the use of health data presents a variety of challenges.

Chapter 1

Real-World Data Overview

Both RWD and RWE have many valuable uses in healthcare, including patient recruitment for clinical trials, comparing drug efficacy, and monitoring drug safety. However, it is important to understand the key differences between these two concepts.

What Is Real-World Data?

Clinical trials seek to answer specific questions in a controlled environment to earn regulatory approval of an investigational drug or device. Typically, the information gathered from these trials lacks data from a real-world environment. Therefore, post-market studies are often critical to understanding patient adherence and clinical efficacy in the real world, outside of a controlled clinical study.

The pharmaceutical industry has traditionally used randomized controlled trials when seeking approval of therapies, but the U.S. Food and Drug Administration (FDA) has been developing a framework and issuing guidance to support the use of real-world evidence—enabled by RWD—in regulatory decision-making. Additionally, technological innovations that protect patient privacy have expanded the possible sources of real-world data available to researchers immensely.

RWD comes from a variety of sources outside of traditional clinical trials. Researchers routinely collect patient data from sources such as:

  • Claims and billing activity
  • Electronic health records (EHRs)
  • Patient-reported outcomes
  • Disease or product registries
  • Biometric monitoring sources such as pedometers and smartwatches

Health data from these sources can provide a more comprehensive picture of the patient journey and experience, and can even provide an overview of population health. Although RWD can enable clinical evidence, it’s essential that the data is fit-for-purpose (i.e., relevant, valid, and reliable), which is determined based on the research question.

How Does Real-World Data Differ from Real-World Evidence?

Real-world data and real-world evidence are often used interchangeably, but they are two different concepts. RWE derives from the analysis of RWD and can provide valuable information about the risks, benefits, and use of a therapy. Real-world evidence helps accelerate the approval of new therapies, especially in oncology.

What Is the Value of Real-World Data and Real-World Evidence?

By using sources of patient health data, researchers can evaluate therapies in a larger population, in real-world conditions, and at a lower cost than with typical clinical trials.

RWD has the potential to provide information about a more diverse population than the typical clinical trial participants. Therefore, researchers can get valuable efficacy and safety information on a more representative population than they can from a randomized clinical trial.

RWE provides a more comprehensive view of how a therapy will work in a real-world setting. Researchers can evaluate the therapy while factoring in other variables such as comorbidities, demographic groups, and age groups, among other parameters. Most importantly, RWE helps researchers develop a better understanding of the long-term use of the therapy beyond the clinical trial period.

Example Use Cases of RWD and RWE

Some ideal use cases for RWD and RWE include regulatory requirements and deciding on a treatment plan for patients.

RWE can help support regulatory requirements to expand on a therapy’s indication without performing a full additional clinical trial. For example, if a product is often prescribed for off-label conditions, companies may use RWD to study patient outcomes and therapy safety and then submit this information to regulators for market authorization.

Healthcare providers can use RWD and RWE to better inform a patient’s treatment plan, procedures, tests, and prescriptions, and these data may help develop practice guidelines. For instance, during the beginning of the COVID-19 pandemic, public health officials needed to rapidly evaluate and share information on the prevention and treatment of COVID-19. Much of the information gathered during this time leveraged RWD.


Benefits of Leveraging Real World Data (RWD) to Conduct Long-Term Follow Up

Watch now
Chapter 2

Real-World Data Ecosystem

Numerous types of patient data gathered from multiple sources can be useful for generating RWE. To develop a deeper understanding of how RWD can be used—and how it can add value in healthcare—examining key data types and sources can be instructive.

Real-World Data Types

Health data can be pulled from numerous sources to provide valuable insights into the patient journey.

Claim Data

Claims data results from processing a healthcare claim. Two types of claims data include open and closed. Open claims datasets come from claims clearinghouses or providers’ revenue cycle management systems. They cover a large scale of patient lives, but may not represent complete claims coverage for a given patient.

Closed claims come from health insurance plans or self-insured employer groups. They tend to cover a smaller scale of patient lives, but represent complete claims coverage for a given patient during the time that patient was on the insurance plan or worked at the employer. Claims data is longitudinal in nature and captures a long period of the patient journey, but it does not have as much depth of clinical detail about a particular medical encounter as other data types.

Because closed claims datasets are very comprehensive, they prove ideal for health economics and outcomes research (HEOR) that considers a patient’s journey, resource utilization, and the economic burden of their condition. Open claims datasets prove less useful for HEOR, due to their incompleteness for a given patient. Given the large scale of patients covered as well as lower data latency, open claims datasets can prove useful for marketing use cases.

Claims data is even more powerful when used in tandem with clinical metrics such as lab data, EHR data, or patient-reported outcomes. The combination of these data sets can provide deeper insight into symptoms, disease progression, and clinical outcomes.

Laboratory and Genomics Data

Lab testing data proves valuable for a variety of use cases in healthcare analytics, from market sizing to monitoring disease progression to finding biomarker signals of patients eligible for certain therapeutics. Lab data can provide a deep point-in-time clinical and biochemistry profile of a patient, but isn’t as longitudinal as claims data.

Genomics data is a specialty area of lab testing currently growing in popularity within healthcare analytics given the increase in biomarker-targeted therapeutics. Genomics data proves useful for both clinical development use cases in which scientists may employ genomics data to inform biomarker selection, or in commercial use cases in which genomic results may provide input towards building a predictive model to find patients eligible for a biomarker-targeted therapy.

Pharmacy Data

Pharmacy data provides information about which therapies patients have been prescribed and filled at a pharmacy. It can give insight into how therapies change over time. Pharmacy data proves extremely useful for specialty drugs, which now account for approximately 75 percent of prescription drugs in development. A network of specialty pharmacies, contracted by pharmaceutical manufacturers, typically distributes specialty drugs. The manufacturer will then aggregate real-world data from specialty pharmacies to understand real-world prescribing, dispensing, and medication adherence patterns.

Electronic Health Records (EHR)

As patients move throughout the health system, valuable real-world data is collected as part of their electronic health records (EHRs). EHR data contains richer clinical detail than claims data, but a patient may visit many providers across different care settings, using different EHR systems. This makes finding a single EHR real-world data source very unlikely. EHRs contain data on appointments, medical history, diagnoses, symptoms, medications prescribed, labs, and chart notes. These data are important for gaining a more granular understanding of clinical patient outcomes.

Even though EHR data is valuable, it requires significant curation and cleaning because much of the valuable information may reside in unstructured physician notes.

Generally, the information recorded as part of a patient’s EHR—whether they’re in an inpatient setting, outpatient setting, or a specific therapeutic area—includes:

  • Procedures performed
  • Diagnosis
  • Vital Signs
  • Laboratory results
  • Medication orders
  • Medications administered
  • Patient surveys or questionnaires
  • Surgical care information
  • Symptoms
  • Immunizations
  • Social history, such as smoking status

Note that electronic health records on their own may not contain all of the necessary RWD, so researchers may be required to seek additional sources of data.

Oncology Data

RWD plays an important role in answering a variety of research questions surrounding cancer. One of the top priorities in research is generating accurate evidence on the efficacy of cancer prevention, diagnosis, and treatment in a real-world setting.

Researchers often study cancer treatments in a select population in a clinical trial setting. However, researchers can also collect and analyze real-world oncology data to provide RWE on the efficacy and tolerability of new treatment methods in the real world. The main sources of real-world oncology data include:

  • Registries
  • Claims
  • EHRs
  • Specialty data providers and networks

Each state legally mandates central cancer registries, thus providing a census of all the patients who have cancer within a defined geographic area. Because of this, as well as the capture of detailed exposure information such as diet or physical activity and patient-reported outcomes, these registries provide unique information because data comes from a non-random group of people.

Limitations of cancer registry data include a lack of information on outcomes other than survival as well as long-term treatment. Addressing these limitations requires new initiatives such as linking registry data with data from other organizations. The new initiatives, as well as real-time access to pathology reports, provide opportunities to supplement the understanding of therapeutic advances and impact outside of clinical trials

Consumer Data

Recently, researchers have increased the demand for consumer data. This information can provide additional context about a patient population, such as:

  • Employment
  • Socioeconomic status
  • Interests
  • Health
  • Race
  • Ethnicity
  • Languages spoken

These data come from consumer data companies and have traditionally been used for targeted marketing. Note that consumer data companies only have data on adult consumers.

Social Determinants of Health (SDOH)

Social determinants of health (SDOH) are the conditions in which people are born, work, live, play, age, and worship. These have a large impact on peoples’ health, quality of life, and functioning. Some examples of social determinants of health include:

  • Poverty
  • Education
  • Racism
  • Polluted water and air
  • Access to healthy, nutritious foods
  • Physical activity opportunities

Social determinants of health contribute to health inequities and disparities. For example, those who don’t have access to grocery stores that carry healthy foods may not have good nutrition, which can lead to obesity, diabetes, and heart disease. Data on SDOH can provide insights to address health disparities and health equity.

Real-World Data Sources

Different types of data providers are relevant for different situations. Three types of data provider categories exist:

Data Platforms

Data platforms provide a technology platform that has intuitive user interfaces (UI) for analyzing data within the platform. These companies have data science and data engineering teams that clean and standardize continuous streams of data coming into the platform, and combined with the UI layer, can be considered user-ready.

In many cases, the platform provides limited ability to export data for use. Working with a data platform is best for companies without data analytics or data engineering capabilities.

Data Aggregators

Data aggregators offer cleaned and standardized data that have been aggregated from many underlying sources. Typically, a technology platform or user interface overlay doesn’t exist, and the data are available to license as a one-time or continuous data feed.

These data are analytics-ready. Companies working with a data aggregator need to have data analytics or business intelligence analysts who can manipulate the data into analysis, but they do not need to have sophisticated data engineering to clean and standardize the data.

Data Originators

Data originators are closest to the source. They have the most granular and detailed data, but they do not clean it. These data requires the application of sophisticated data engineering capabilities before it can be analytics-ready.

Real-World Data Solutions Providers

The RWD ecosystem includes both real-world sources, as described above, as well as solutions providers that have built analytic and workflow solutions on top of real-world data. Many platform companies are also solutions companies, having built specific data views and analytic tools that provide solutions for specific use cases.

Common commercial analytics and clinical development solutions built on top of real-world data include:

Commercial Solutions

  • Specialty pharmacy aggregation: These companies aggregate specialty pharmacy data on behalf of pharmaceutical manufacturers to monitor therapy launches. Specialty drug data is proprietary data of pharma companies that may link to other real-world data such as claims for a longitudinal view of the patient journey.
  • Outcomes and patient journey: These companies enable outcome studies and patient journey research. Many of these companies build their solution on top of aggregated and linked claims data to enable a comprehensive view of patients as they move through the healthcare system.
  • Commercial triggers: These companies provide triggers to commercial teams at pharmaceutical companies to alert them when a patient eligible for a specific therapy sees their provider, so sales teams can be deployed to the provider’s office for education on the relevant disease or therapy. Use of this solution is especially common in rare diseases since providers are often unaware of the rare disease and patients can be hard to diagnose.
  • Digital marketing: These companies identify relevant patients and providers and then serve up digital advertising to educate them on a disease or therapy.
  • Commercial analytics and insights: These companies provide aggregated data, usually claims or EHR, to help commercial teams with strategy and insights before and post-launch.

Clinical Solutions

  • Trial recruitment: These companies use aggregated data—typically EHR, lab, and claims data—to identify the ideal clinical trial sites that have sizable populations of patients who would meet inclusion/exclusion criteria for a trial.
  • Synthetic control arms: Synthetic control arms, also known as external control arms, are studies in which real-world data is utilized as the control arm rather than enrolling actual patients into a control arm where a placebo or standard of care (SOC) is utilized.

    This is popular in disease states where patient populations are increasingly sub-stratified by biomarker status (e.g., oncology, rare disease), given the challenges of recruiting enough patients as well as the ethical considerations of placing patients on placebo or standard of care (SOC). Companies that provide these solutions often have deep and highly curated clinical and genomic data to conduct synthetic control arms. Synthetic control arms lower trial costs, increase efficiency, and increase the speed of therapies to market.
  • Decentralized clinical trials: These companies provide technology infrastructure to collect data and support decentralized clinical trials (DCTs). DCTs are trials where patient communication and data collection has been decentralized away from a traditional clinical trial site. Instead, remote and digital technologies communicate with study participants and collect their data.

Find RWD partners in the largest health data ecosystem in the U.S.


Enriching Oncology Studies with RWD Before, During and After the Trial

Watch now
Chapter 3

Major Use Cases for Real-World Data in the Healthcare Industry

RWD brings a lot of value to different organizations in the healthcare industry, from life sciences to payers to public health agencies.

Life Sciences

Biopharmaceutical organizations can use RWD across the entire drug development lifecycle, from pre-clinical to clinical development to commercial planning and post-marketing monitoring.

In the pre-clinical and clinical development settings, organizations can use RWD for:

  • Biomarker selection (pre-clinical)
  • External control arms (clinical development)
  • Long-term follow up
  • Confirmation of patient medical history for clinical trial enrollment
  • Additional analysis of patients’ social determinants of health

During the commercial phase, organizations can use RWD for:

  • Market access strategy
  • Salesforce planning
  • Monitoring launch effectiveness
  • Drug efficacy comparisons
  • Evidence generation to support reimbursement
  • Commercial targeting


Payers can use RWD to:

  • Assess and validate value-based contracts
  • Improve risk adjustment calculations
  • Develop a holistic, longitudinal view of applicants

Payers often use RWE to inform comparative efficacy in a real-world setting after a drug has launched to validate coverage. According to recent research, approximately 85 percent of pharmacy administrators reported using RWE to make formulary decisions in oncology for comparative efficacy when clinical trial data wasn’t available.


At the individual patient level, real-world data and real-world evidence are often used for decisions on:

  • Procedures
  • Test orders
  • Prescriptions for patients

RWD and RWE can help healthcare providers create targeted treatment plans for patients. Within the larger hospital system, providers may use them to inform the creation of practice guidelines and the further adoption of these guidelines.

Data and Analytics

Healthcare is becoming more digital due to innovative technologies and the demand for real-world data. Organizations increasingly depend on RWD and RWE to develop analytics, machine learning, and artificial intelligence (AI) applications.

For example, RWD can:

  • Train AI models and predict populations at risk of a particular disease
  • Identify better treatments
  • Understand patient prioritization
  • Improve marketing precision
  • Understand patient behavior

Clinical Research Networks

Clinical research is the process to determine the efficacy and safety of new treatments. Clinical research should directly impact patient care–bringing insights from “bench to bedside”. Clinical research networks aim to have greater impact by collaborating across many health institutions either regionally or nationally to promote clinical research, for example: building large, diverse patient pools with RWD to enhance patient recruitment.

RWD has proven to be instrumental for understanding efficacy and safety of treatments for COVID-19 as well as post-acute sequelae SARS-CoV-2 infection (PASC), known colloquially as “long COVID”. Creating a national network for COVID research has led to scientific and operational efficiencies and led to faster discoveries and improvements that make a difference in peoples’ lives.


In government applications, RWD and RWE provide benefits for regulatory agencies such as the FDA and the European Medicines Agency (EMA). RWD and RWE can be employed alongside randomized clinical trial evidence for post-market safety monitoring, adverse event signal detection, and marketing authorization.

In 2008, the FDA adopted the Sentinel Initiative to assess approved product safety by integrating nationwide claims and EHR data. The FDA is the primary user of this system, but the system also provides valuable information to researchers and biopharmaceutical companies.

RWD also provides value in the regulatory approval process, in addition to monitoring side effects. For example, physicians can prescribe therapies off-label in the U.S., but the regulatory label of the therapy in question can determine coverage decisions and even how many patients will be able to receive treatment.

Due to RWD, the FDA is increasingly expanding regulatory labels to allow more patients to receive treatment. For example, a therapy initially approved only for women with ER+/HER2- breast cancer was later approved for use in men because of patient outcome data reported in that patient population’s EHRs.


Evolution of Electronic Health Record (EHR) Data

Read more
Chapter 4

Challenges of Real-World Data

As important as RWD is, it also presents a number of challenges. A plethora of patient data exists, but before researchers can use and analyze it, the data must be de-identified for patient privacy. Additionally, because RWD must be fit-for-purpose, finding the right, relevant data for the applicable use case can prove challenging.

Navigation of the Expanding Data Landscape

The expanding availability of data is creating the demand for additional data, especially specialized data. As more data becomes available, it opens up the possibility of more comprehensive analyses.

The availability of more specialized data doesn’t mean it’s the right data. The right data has become increasingly difficult to find, and finding the right data partner presents a bottleneck to data-sharing.

It’s essential to provide partners in the healthcare ecosystem with the necessary data-sharing tools. Then, it’s essential to show data users where to find the necessary data for their particular use. Assessment tools that better facilitate data exploration, segmentation, and overlap comparison help with analyzing and sharing data.

Data-Sharing Technologies

As data continues to be generated, it also introduces new patient privacy risks, resulting in demands for data that protects patient privacy while maintaining transparency and accountability. The key? An ecosystem of technology companies that allow data management, governance, and data application.

Data-sharing technologies are an unmet need in real-world data. Ideally, every data partner should have control over their records as well as confidence in the integrity of their data. Data providers and users each want something different. Providers want to keep their competitive advantage and keep data independent of their competitors and peers, while users want to easily analyze data without being tied to a specific provider. Both providers and users want transparency.

The solution is a trusted third-party data enclave that doesn’t buy or sell data and has security and privacy as the highest priorities. Collaboration is key to seamless partnership across the RWD ecosystem.

Data Standardization and Quality

RWD is often incomplete and non-standard. Data is collected in many different formats. Many standard data models exist, but they apply to different types of data. This results in data recipients spending many resources to standardize the datasets before they can be analyzed.

Real-world data is often incomplete, which affects the accuracy later on because recipients can’t know that the data reflects the entire patient journey. It’s unclear if the patient outcomes didn’t occur at all, or if the data simply didn’t capture them.

However, as more and more data is generated, the opportunity exists for companies to collaborate on data cleaning, harmonization, and imputation.

Unstructured Data

Data is generally structured data, which has a standardized format and follows an order. As the health data ecosystem expands, however, much of the data coming online is unstructured data.

Unstructured data has enormous potential. For example, clinician notes may better describe a patient’s medical history or quality of life. Genomic sequencing may provide better insights into the benefits of precision medicine.

Acquiring the right data proves challenging because of the difficulty in determining which information is relevant to the use case. Additionally, deriving insights from unstructured data can prove difficult because it requires complex programs to process. An unmet need exists for technologies with the capability to apply data inputs to unstructured data.

Patient Privacy and Data Utility

Patient privacy is essential to the use of RWD, but ensuring data utility may prove challenging. The key? Finding a balance between not compromising privacy and maintaining utility. The choice between de-identifying patient data through safe harbor versus expert determination depends on research objectives, patient privacy, and business needs.

Expert determination is generally ideal when striking a balance between utility and privacy because of the flexibility it offers. For example, experts may recommend redaction, removal, or modification of identifying data elements in the data set, whereas safe harbor removes a set of 18 predetermined values.

De-identification’s primary disadvantage is its time-intensive nature. It can often take months to complete, but the right expertise and technology can greatly accelerate the process while ensuring the data is fit-for-purpose.


3 Criteria for Evaluating Privacy Expert Services and Technology

Read more
Chapter 5

The Future of Real-World Data

Expanding technology and changes in regulations provide plenty of opportunities for the expansion of RWD and quicker de-identification for patient privacy.

New Real-World Data Use Cases

A future use case of real-world health data involves advancements in genomics testing. Radiographic features can determine the genomics of a tumor in oncology. Clinicians can use this insight in real time to correctly diagnose and start the patient on the appropriate treatment.

This data advancement improves precision medicine and helps deliver more meaningful insights. However, the need to maintain patient privacy remains. Using advanced methods such as deploying synthetic data could meet patient privacy needs. Genomics data doesn’t just benefit oncology—it could lead to advancements in treatments for rare diseases and other therapeutic areas as well.

New Types of Data Available

New policies, scientific discoveries, and innovative healthcare technologies have increased the variety and volume of available health data.

Genomic sequencing combined with the increase of biomarker-specific drugs has led to increased genetic testing. Wearable technology and health apps collect data about users’ heart rates, steps taken, geo-locations, and more. Air quality, climate, and weather are even becoming more influential data points. For example, weather can predict the severity of an allergy season, pandemic spread, and even flu prevalence.

All of these factors have led to the rapid growth of RWD. In fact, healthcare data is growing faster than data from any other industry.

Recent Trends with RWD

The increase in health data has brought about the increased use of RWD and RWE in healthcare. Some recent health data trends include:

Demographics and SDOH

Government, health systems, and life science researchers are striving to understand the reasons for disparities and worse patient outcomes in vulnerable populations.

Growing Use of Genomics Data

Data science, machine learning, and artificial intelligence are empowering scientists to tackle questions once left unanswered—such as whether genetic alterations cause health conditions and whether lifestyle or demographic considerations are relevant to a disease. The ability to derive meaningful insights and patterns from genetic sequencing data and other health data is opening new paths for future progress in the field.

Growing Acceptance of Real-World Data

FDA real-world data guidance documents continue to be released, pointing to the growing acceptance of RWD for regulatory decision-making.

Disease Registries

Registries capture specific variables related to various conditions, which are then validated to a higher standard than EHR data. This makes it ideal for clinical trials and preparing data for regulatory submissions.

Patient-Reported Outcomes Data

Data on patient outcomes is better for understanding the patient experience.

Data on COVID-19 Vaccinations and Variants

Since the beginning of the pandemic, information on long COVID-19 and the lingering impacts of COVID-19 on a person’s health has been in high demand.

Rare Disease Data

Patients with rare conditions often see numerous specialists and receive specialty drugs, so the patient journey may be fragmented and data spread amongst many partners.

Linking Proprietary Data with Real-World Data

To gain a comprehensive understanding of patient health, a recent trend in healthcare is to link proprietary pharma company data—such as clinical trials, aggregated specialty drug data, and disease or device registries—with real-world data offered by commercial data providers.


Tokenizing every clinical trial and health economics and outcomes research study can enable expedited partner identification for multiple studies at the same time.


How AI and ML Enhance RWD for Precision Medicine

Watch now
Chapter 6

Partner with the Largest Health Data Ecosystem

RWD helps accelerate and inform decisions that improve patient outcomes. New data, advancing technologies, and novel RWD use cases are leading healthcare to new frontiers.

Interested in joining us in improving patient outcomes through data connectivity? Contact us today for more information.

Real-world data ecosystem

Gain access to more than 500 partner organizations and every type of linkable real-world data.

Contact us