Blog /

Health Data Trends Part II: New Data Types in the Datavant Ecosystem

Publish Date
Read Time
January 18, 2022


In Health Data Trends Part I, I reviewed the most common questions that health professionals answer using health data. This time, I’ll review new health data available through Datavant’s ecosystem partners and its value in answering specific questions. As partners continue to join our ecosystem, there are more opportunities for health systems, insurers and biopharma companies to connect health data that completes the picture of patient health.

1. Electronic Health Records (EHRs)

After health insurance claims, EHRs comprise the largest number of de-identified patient data records in the Datavant ecosystem. Last year, we added 17 EHR data partners. As a result, we can now enable clients to connect to EHR data on over 300 million de-identified patients. In the last several years, the caliber of EHR data has improved. While most EHR partners standardize structured data from the EHR system, some partners also abstract tailored concepts from clinical notes for CMS reporting and FDA submissions. Some EHRs are also focused on specific disease states such as oncology, rheumatology, mental health, women’s health, and dermatology. They capture disease-specific variables which make them more relevant for answering questions related to that condition.

2. Health Systems

In addition to new EHR data partners, the Datavant ecosystem added thousands of health system relationships last year enabling full medical record retrieval with patient consent across 2,000 hospitals and 15,000 clinics. This capability is already utilized by healthcare providers and health insurers for compliant health data exchange. We see new use cases in the life science community for complete medical records access, which we believe will transform first-party clinical research by vastly expanding the amount of health data available on each trial participant. For instance, a sponsor could conduct long-term follow-up post-trial or supplement trial data to understand adverse events, super-responders or non-responders. Full medical records differ from traditional EHR data sets which are typically a structured sub-set of data fields from the EHR. Medical record retrieval includes all of the EHR’s unstructured data, which accounts for 80% of all the information contained in the EHR.1 I will do a more comprehensive overview of chart retrieval and associated use cases in a future blog post.

Several provider groups (four health systems and two research consortiums) are also using Datavant technology to de-identify and connect external real-world data to patient data for health system research and healthcare performance analysis.

3. Registry Data

Five registry organizations joined the ecosystem in disease states like immunology, cerebral palsy, ophthalmology and several rare diseases (such as pulmonary arterial hypertension (PAH), hemophilia, phenylketonuria (PKU) and others). Registries capture very specific variables related to each disease. Registry data is validated and quality checked to a much higher standard than typical EHR data. The quality of this data makes it an ideal source to link to a biopharma company’s clinical trial data. These organizations also offer study services teams that can prep data for regulatory submissions.

4. Specialty Pharmacy (SP) Data

Specialty drugs account for 75% of prescription drugs in development.2 Pharma companies have a view of their own specialty drug distribution but have limited visibility into the full patient journey before and after patients begin treatment. Pharma companies can connect SP data with patient hub data and claims data to understand lines of therapy, patient adherence and the effectiveness of hub services. Four new sources of SP data joined the ecosystem last year bringing Datavant’s coverage of the SP space to dozens of players.

5. Genomics, Digital Pathology and Specialized Lab Data

Eight new genomics and diagnostics data partners joined the Datavant ecosystem last year. One is a biobank associated with a large academic medical center with millions of genetic sequencing test results, many of which are linked to EHRs. Another is a genomics testing collaborative and another provides specialized liquid biopsy testing to classify risk in lung cancer. Lastly, we added a provider of COVID testing and associated variant sequencing data.

6. Consumer Data

Consumer data includes demographics like age, gender and race; social determinants of health (SDoH) such as employment, income, and education; and behavior, lifestyle and purchasing pattern propensity scores. Five new consumer data partners joined the Datavant ecosystem in 2021. Consumer data support many use cases including:

  • Comparing outcomes by gender, race and age, income, education and employment
  • Identifying barriers to accessing high-quality care
  • Evaluating how lifestyle and purchase decisions influence overall patient health.

7. Wearables and Digital Health

Wearables, digital health interventions, remote monitoring apps and condition-specific social networks are improving health and wellness. We added five wearable and digital health companies last year. These technologies create engaging patient experiences and collect continuous data on various biometrics such as sleep duration and quality, heart rate, and activity levels. Many of these companies combine disease-specific devices and mobile apps, such as implantable continuous glucose monitors (CGM) for diabetes or wearable sensors for musculoskeletal conditions. Linking this health data offers continuous, real-time insight into patient health.

8. Mortality Data with Cause of Death

Mortality is a key endpoint in many studies. Yet, it is often not captured in EHR/EMRs. Datavant ecosystem data partners aggregate mortality data that covers more than 85% of U.S. death events. In 2021, we added one new source that includes cause of death, which is particularly valuable in clinical research. Mortality data should be linked to every trial with a mortality endpoint to maximize data completeness. Health systems should link it to measure the effectiveness of patient care, understand their active patient population and identify underserved populations. We’ve even seen payers use it to detect fraudulent claims.

9. Weather

On the cutting edge of health data is the integration of weather and environmental conditions data. A large source of this data joined the Datavant ecosystem last year. Weather, air quality and climate are becoming more influential on health. Weather is being used to manage supply chains, predict pandemic spread, estimate flu prevalence and the severity of allergy season to name a few use cases.

10. First-Party Data

In 2021, Datavant tokenized 100+ first-party data sets for life sciences clients, representing a variety of proprietary health data including clinical trial data, patient registries, hub data, and sponsored genetic testing data. Tokenizing proprietary first-party data and linking it to third-party commercial data is valuable for many reasons. Use cases include understanding clinical and economic outcomes of patients utilizing a certain device, building look-a-like models to reach rare disease patients, assessing the effectiveness of patient services, and long-term follow-up of trial patients. Linking proprietary first-party data is an important part of enterprise data strategy.


Real-world data is becoming more granular and finely tuned to answer questions across specific diseases and patient types. New data sources are entering the ecosystem earlier in their life cycle and seeking partnerships to inform their data strategy. Many are building flexible data platforms that allow health data users to link their first-party data to the platform.

I’ll publish more on health data trends throughout the year. Tell us what health data types you want to learn more about through this short 2-question survey and we’ll keep you up-to-date on new content.

You can always read more about health data and health data infrastructure on the Datavant blog, where you’ll find a detailed post about real-world data infrastructure and how it is being used to fight COVID, written by Datavant’s President and Co-founder, Travis May. Email me anytime at if you have questions or comments!

  1. Kong, Hyoun-Joong. Managing Unstructured Big Data in Healthcare System. 2019; 25(1): 1–2. doi:10.4258/hir.2019.25.1.1
  2. Top tech predictions for the future of specialty pharmacy. BioPharma Dive. July 22, 2021. Accessed January 9, 2022.

Editor’s note: This post has been updated on October 19, 2022 for accuracy and comprehensiveness.

Achieve your boldest ambitions

Explore how Datavant can be your health data logistics partner.

Contact us