Blog /

Four Challenges the Health Data Ecosystem Needs to Solve in 2023

Publish Date
Read Time
October 17, 2022

This summer, Datavant convened an inaugural Product Council – composed of thought leaders and health data super users within our partner ecosystem – to discuss the biggest challenges in health data exchange and share ideas about how we can work together to build solutions. This post highlights key insights from those conversations, which are shaping the way Datavant thinks about collaboration across the health data ecosystem.

The real-world data ecosystem is continuing to expand – and as it does, the rate of fragmentation is outpacing aggregation

Every day, more and more patient data becomes available within the healthcare ecosystem. The increased availability of data has enabled the healthcare industry to meaningfully improve the patient experience and care by better understanding disease outcomes, more accurately assessing a patient’s real-world experience with care delivery, and more. This novel and emerging data, however, is fragmented – available across a disparate and disconnected set of sources – and the pace of fragmentation is accelerating too.

As the data ecosystem’s expansion and fragmentation continues to accelerate, the increased complexity creates a new set of challenges. Each of these emerging challenges is unique, and every solution will require collaboration from across the industry.

The evolution of novel datasets and the tools to leverage them is bringing better insights within reach. It would be a great loss not to work together to deploy these capabilities and drive solutions that improve human health.

Informed by a series of conversations with Datavant’s Product Council, we reflect on the biggest bottlenecks in health data exchange today, and highlight the opportunities that exist for our industry to work together and address them.

Table 1: Types of health data, where it comes from, and opportunities and challenges of this data.

Build tools and expertise to help navigate the expanding landscape of data

Expansion and fragmentation are driving two currents in the health data ecosystem. The first is that expanding data availability is creating demand for more data, and for specialized data. As novel data sources become more readily available, it becomes possible to deliver more comprehensive and impactful analyses.

Fifteen years ago, the limit of a health data analysis might have been using prescription data to understand which providers were prescribing a therapeutic. Today, the broader availability of claims, clinical, and social determinants data means that companies can assess which patients are adhering to the treatment, and why. More and more, the most innovative data users intent on delivering industry-leading analytics and solutions need to leverage unique sources of data.

Simultaneously, greater fragmentation means that these unique data sources are available through specialized providers, who may not be easy to find or be operating at scale. More data, and more specialized data, is becoming vital for key use cases. The right data, however, is more difficult to find.

The result is that just finding the right partner presents an immediate bottleneck to exchange, and equipping everyone in the healthcare ecosystem with the tools to navigate the landscape becomes foundational to sharing data.

The right solution starts with showing data users where to find, for example, lab test records that they need to better understand the rate of rare disease diagnosis and misdiagnosis. The solution could also enable data users to make standardized comparisons across data providers – between two “third party” data providers, and between a third party and their own internal “first party” data – as well as help data providers vet potential users.

Technology is part of the answer: assessment tools which facilitate dataset exploration, overlap comparison, and segmentation. The complete solution also has a place for “expert services”: subject matter experts who deeply understand the landscape of available data and can offer consulting “horizontally” across the industry, without being constrained to a specific organization.

Table 2: Questions that health data users and providers want to know.

Create technologies to enable transparent, controlled, and easy analysis on connected data

Finding and evaluating data providers is the first but not the last challenge. Solving complex health problems increasingly requires combining disparate datasets in novel and complicated ways.

As the quantity of data skyrockets and introduces new risks for patient privacy, responsible and ethical data governance demands that data combination be done in a way that protects patient privacy and creates transparency and accountability for all the parties involved. Enabling the appropriate management, governance, and application of data will take an ecosystem of exceptional technology companies.

The unmet need is for data sharing technologies that give every partner control over their records and confidence in the stewardship of their data. Businesses who collaborate to solve one problem may later compete to solve another – and that knowledge informs their willingness to share data today.

Data providers want to keep their data independent of their peers and competitors, to maintain competitive advantage and to minimize the risk of inadvertent or unauthorized combinations. Data users want to be able to easily analyze connected data without being constrained to a particular provider. Everyone wants transparency into how the data is being processed and applied.

One solution is connecting health data through a trusted third party. These data stewards ought to be neutral enclaves – they don’t buy or sell data – and be built with security and privacy as the highest priority.

Other solutions lie in applying novel privacy-preserving technologies to health data exchange. Federated technologies, such as Multi-Party Computation (MPC), provide a mechanism to exchange information about a dataset without that dataset ever needing to leave its original environment. Emerging technologies like synthetic data enable the creation of datasets which contain no real patients’ information, but in aggregate produce the same analytical results – enabling insights without requiring the underlying data to be shared.

However these solutions are combined, cross-industry collaboration is key.  The data stewards and technology providers will need to be able to partner seamlessly across the health data ecosystem, with not only any data provider and any data user, but with any other technology partner providing additional tools for analysis, privacy, governance, or linking.

Drive data standardization, quality, and transparency at scale

Real world data is frequently non-standard and incomplete. This “messiness” characterizes the real world data that is available today as well as the new data types that are emerging. Solving the resulting challenges will require leading companies to work together.

Standardization is the first challenge. Today, data is collected in nearly as many formats as there are points of collection. Standard data models exist, but there are many of them and they apply to different data types. On top of that, large swathes of the industry lack the understanding or the right incentives to comply. The end result: Data recipients spend precious resources standardizing different datasets before records can even be combined and an analysis begun.

The second is data quality. Real world data is incomplete, and this “missingness” inhibits the accuracy of downstream insights. When a dataset has high missingness, one can’t be certain that it captures the full patient journey.

The patient outcomes that one aims to better understand may not have occurred – or they may not have been captured in the dataset. Missingness also limits a dataset’s linkability through the omission of elements necessary to reliably connect with other records, further impeding the ability to capture the full patient journey.

By the time data gets to an end user, the reason for any missingness is also often obscured. Obscurity creates a third challenge because it impedes reliability – insights are only as strong or as weak as the data that powered them. In particular, transparency and provenance challenges are a critical impediment to broader acceptance of real-world data for purposes like regulatory approval.

Without clear knowledge of a dataset’s strengths and weaknesses, it’s difficult to anticipate the blind spots in analysis, be that patient populations that may be underrepresented or outcomes that it may be predisposed to miss.

These challenges are costly for companies in terms of  time and dollars. The cost to patients can mean big differences in health outcomes.

Solving the problem at scale requires many actors. There is a role for industry associations, trade groups, and other stakeholders to align to a more uniform standard within specific data types.

Regulators can advance standardization by implementing the right incentives. Data users can influence the equation by applying buy-side pressure. As achieving greater data transparency and quality becomes a competitive differentiator, data providers will rise to the expectations of data users and regulatory bodies governing use of RWD.

The clearest unmet need, however, is in technology. This is a particular opportunity for partnership across a constellation of exceptional technology companies. As the health data ecosystem expands and brings more and more “messy” data online, there is a rapidly expanding opportunity for specialized companies to collaborate on different pieces of the answer: from data cleaning, to harmonization, to imputation.

Capture the value and insight within unstructured and emerging patient data

Unstructured data is a growing opportunity for exchange and analysis. The real-world data that powered the first sea change in data-based decision-making has been dominated by structured data (data that has some standardized format or data model and follows a persistent order). As the health data ecosystem continues to expand, however, much of the novel data that is coming online – from clinicians’ notes to pathology images to patient reported outcomes and genomic sequencing data – is unstructured data.

The potential value in unstructured data is enormous. Clinical endpoints that illustrate the success of a treatment program, such as the size and stage of a cancerous tumor, may only be available in patients’ imaging data. Unstructured clinician notes may describe a patient’s quality of life or even medical history beyond the structured data entry. A patient’s genomic sequencing may provide critical insight into the likely benefit provided by a precision therapeutic.

Extracting that value is challenging. Acquiring the right data presents an immediate obstacle, as it’s difficult to determine from among a collection of notes or images which correspond to the right patients and which contain information relevant to the use case.

De-identification, when necessary, is also a tough problem to solve since identifiers could be a stray word or genomic detail. Finally, deriving insight from the data is a challenging process, requiring either complex programs that can process the unstructured data or tools that can first structure the information for easier utilization.

There are learnings to draw from existing solutions for structured data exchange. Tools exist that can locate specific patients’ records. Expert statisticians can assess a dataset’s identification risk and recommend steps to de-identify it. Analytics programs can derive critical insights from even massive tables of data.

Many of these tools, however, rely on structured data inputs. There is a great unmet need for partner technologies that can translate these capabilities and apply them to unstructured datasets.

Success requires cooperation

Solving these challenges is an opportunity to unlock a stepwise change in how health data is shared to power better insights and improve patient outcomes. The ambition is a future of healthcare where the right data can be easily found, combined, and applied to generate trustworthy insights and drive rigorous, data-based decision making.

Achieving that ambition will require technology, expertise, and collaboration from across the health data ecosystem. Getting it right means committing to work alongside a broad constellation of partners who bring unique and complementary strengths – and relentlessly pursuing the “win-win-win” opportunities that allow everyone to play to them.

At Datavant, we’ve made the commitment to collaborate across the health data ecosystem, solve this next wave of challenges, and achieve a data-driven future of healthcare. If you share that vision, let’s get in touch. Reach out to us at

Prepared by Quinn Johns, Vera Mucaj, and Su Huang

Editor’s note: This post has been updated on December 2022 for accuracy and comprehensiveness.

Achieve your boldest ambitions

Explore how Datavant can be your health data logistics partner.

Contact us