Blog /

Top 5 Trends in the Healthcare Data Ecosystem

Publish Date
Read Time
January 23, 2023

By Su Huang and Jia Han

This is a thrilling time to be working in the healthcare industry and with real-world data (RWD) in particular. There is a rapidly growing healthcare data ecosystem, major strides in privacy-preserving methods, growing applications of artificial intelligence (AI) on big healthcare data, and an increasingly open attitude amongst industry stakeholders to use data collaborations to achieve better patient outcomes.

Here are the top 5 trends we observed:

1: More curated and disease-specific clinical data

We’ve seen clinical data providers offer more curated and disease-specific datasets rather than general.

In 2021, with the exception of oncology RWD providers which have long had focused offerings, most of the EHR data partners joining the Datavant ecosystem had general, disease-agnostic datasets. In 2022, we saw a trend towards more disease-specific EHR datasets. This is driven by two factors. First, curating EHR data is expensive. To maximize the value of clinical data, abstraction, review and data cleanup is necessary to ensure EHR fields are entered correctly and consistently, and that physician notes from unstructured text fields are organized into structured data. The time and expense needed for such curation necessitates more focus. Second, life science companies want disease-specific offerings because clinical data is typically licensed by disease state — nearly all the data requests we see from pharma are specific to a therapeutic area.

In addition to disease-focused EHR datasets, we’ve seen growth of patient registries and disease-specific registries working with Datavant. In 2022, we added new registry partners that were interested in supplementing their registry data (typically clinical data and/or patient-reported data) by building longitudinally through linkage to de-identified claims data, or by building more deeply by accessing patient-authorized medical charts through Datavant’s Medical Record Retrieval.

Disease registry data was the number one requested data type by life science companies and their analytics partners, representing one-third of all requests we received in 2022. In particular, oncology, rare disease, women’s health, neurology, and cardiology data were sought-after therapeutic areas.

2: More partners with Social Determinants of Health (SDOH) offerings

In 2021, we cited demographics and SDOH data as the number one data demand trend we observed. This growth has continued into 2022.

We’ve seen life science companies, payers, providers, and the government stay keenly focused on understanding underserved populations and improving healthcare equity. The FDA even published draft guidance recommending that sponsors submit Race and Ethnicity Diversity Plans to the agency early in clinical development. In response to this strong demand from numerous stakeholders, we’ve seen more data providers develop SDOH data offerings.

Some of these data providers are healthcare data partners who have added demographic and social risk factor data fields to their healthcare datasets. Other data providers are consumer data companies that have expertise building datasets that include demographic, lifestyle, and behavior data, traditionally for marketing purposes, and now offer SDOH-specific data packages for healthcare industry customers.

3: More novel real-world data types, especially genomics and imaging data

In 2022, we saw more novel data types especially genomics and imaging data providers join the ecosystem. Many of these partners are building innovative solutions on top of their data intended for life sciences, health systems and the research community. These solutions include clinical trial recruitment workflows, finding patients for biomarker-specific treatment, population health for health systems, data insights platforms for clinical discovery, and training algorithms to detect disease earlier. Other genomics partners are leveraging Datavant’s connectivity platform to link in claims data to conduct outcomes analysis that will inform reimbursement strategy with payers. We have also seen genomics labs tokenize tested patients in order to conduct long-term follow-up on disease presentation and outcomes via those patients’ RWD.

Life science companies requested genomics data, imaging data and other unstructured datasets such as physician notes more frequently in 2022 than in prior years. This is likely driven by interest in more oncology-related datasets, to help researchers pinpoint new biomarkers, recruit patients for trials, and diagnose patients earlier.  We believe the interest in large unstructured datasets is also growing due to continued sophistication of AI in healthcare, with many big tech companies seeking to build competencies in healthcare data ranging from HIPAA-compliant warehousing, to data connectivity, to analytics of healthcare data.

4: More real-world data partners pursuing collaborations, enabled by privacy-preserving linkage

It seems like almost every day, we see an announcement of a new data collaboration between healthcare stakeholders. Exciting collaborations are happening across the healthcare landscape, boosting the likelihood of novel insights and breakthrough findings — Mayo Clinic and Helix’s collaboration on a population health genomics study named Tapestry, or Caris Life Sciences and ConcertAI’s recent partnership to align their respective molecular profiling and research-grade clinical data capabilities are just two examples of collaborations that have significant promise. Since healthcare data is so fragmented across thousands of organizations, the full power of healthcare data cannot be unlocked until the industry takes a more collaborative approach.

Data users are also expecting more comprehensive datasets to maximize insights. Life sciences, payers, providers, government, and other client stakeholders are increasing their expectations given the expansion of data provider options, prompting providers to seek collaboration and offer data connectivity. For instance, in just a few years, we’ve seen the ability to securely link data become an essential capability requested by life sciences clients. Large data providers who have historically had standalone data offerings are responding by being increasingly open to partnership. Underpinning this more open attitude is the advancement of privacy-preserving methods to ease collaboration without sacrificing patient privacy, including advancements in the ease and speed of the HIPAA expert determination process and novel approaches to de-identifying healthcare data such as generating synthetic data.

5: More internal, proprietary datasets are being tokenized, linked, and used to deliver insights

Life sciences companies of all sizes, from large pharma and medical device companies to small biotech and emerging diagnostics companies, have valuable proprietary data assets that are siloed across internal teams and are not used for any advanced analytics beyond the immediate purpose for which this proprietary data was initially collected. Instead of leaving these datasets in silos, companies doubled down in 2022 to unlock the value of such proprietary data by inventorying and connecting datasets internally as well as externally to RWD to uncover new insights.

Life science companies gained myriad insights from unlocking proprietary datasets such as clinical trial data, specialty pharmacy data, HUB data, copay card data, device registries, diagnostic testing patients, and more. One customer identified a significant number of new ultra rare disease patients by analyzing claims data and linking it to HUB data. Another customer linked patient support program data with closed claims to study the effectiveness of patient assistance and support programs associated with a cardiovascular drug.

Leveraging internal, proprietary data assets for insights generation is a trend that will continue as connecting data to maximize their value becomes the common standard for analytics.


Real-world data swells with promise to make our healthcare system better, including speeding up clinical trials, advancing equity in healthcare, getting specialty drugs to the right patients, and many more use cases. This promise is leading to expansion in the number of RWD providers, new interest from clients in unlocking value from proprietary data assets, and advancement of technology to protect patient privacy while retaining data’s analytical value. We envision a world where every decision in healthcare is informed by data, and there has never been a more exciting time than today!  

If you want to learn more about the Datavant healthcare data ecosystem, submit a request here.

Achieve your boldest ambitions

Explore how Datavant can be your health data logistics partner.

Contact us