Blog /

Data Analytics in Healthcare: Ecosystem Overview

Publish Date
Read Time
Travis May
November 15, 2018

This is part of a series on how the health data ecosystem is organized. Datavant’s vision is to connect the world’s health data to improve patient outcomes and bring new treatments to patients faster. In our business, we see a wide range of health data sources and uses; we’ve tried to categorize these.


In The Fragmentation of Health Data, I gave an overview of where health data comes from and how it flows across the healthcare ecosystem. This post will focus on the analytical uses for health data — both today and in the future.

The health analytics taxonomy is organized around the major end users of health analytics:

  • Patients
  • Providers (hospital systems, doctors, etc.)
  • Life sciences companies (pharma and its service providers)
  • Payers (insurance companies, etc.)

Patient Analytics

The patient is the ultimate end user in the healthcare system. All of the other solutions in this article focus on improving patient outcomes to some degree. Yet there are also solutions that aim to go directly to the patient without payers, providers or life sciences companies acting as intermediaries, including:

Health and Wellness:

  • Self-diagnostic tools use machine learning to leverage the best medical knowledge from physicians around the globe. While many diagnostic tools are being designed to support physicians, there are also direct-to-patient solutions that allow patients to check their symptoms online and develop an independent view before going to see their physician. (Example: Symptom Checker from WebMD)
  • Health prediction and tracking tools allow patients to track fitness activity, heart rate and sleep cycle. Many individuals use wearable devices today as both an incentive to exercise and as a way to better understand their own health. These devices generate a huge amount of data, which analytics companies are using to provide personalized health recommendations. (Examples: Fitbit, Apple Watch, Glooko)
  • Omics analytics are built on now widely available tests that let patients have their genome (or proteome, or metabolome) sequenced. This information can be used to provide descriptive analytics of genetic health risks or other personalized health guidance. Over time, omics are likely to become part of the basic standard of care, but today many patients pay for these tests and applications themselves. (Examples: Ancestry, 23andMe, Helix)

Information Access: A separate class of applications is primarily focused on giving patients direct access to their health data, regardless of which provider they saw for a visit. (Examples: Picnic Health, Ciitizen)

Provider Analytics

Though still in their infancy, provider health analytics have taken off in the last decade with the mass adoption of electronic health records (EHRs). Provider analytics solutions fall into two broad buckets:

  • Patient history tracking through electronic health records — an electronic version of a patient’s medical history that is maintained by the provider — is foundational to many of the other use cases discussed here. It ensures that providers can see past medical history, medications, vital signs, immunizations, laboratory data and radiology reports when they sit down with a patient, and begin to understand the individual patient’s health trends. (Examples: Epic, Cerner, AllScripts)
  • Clinical decision support algorithms use personalized data to identify patients who are at risk of complications, to recommend medications and dosing and to predict likely treatment results based on clinical, monitoring, genomic and other data sources. (Examples: Syapse, IBM Watson Health, Tempus, Precision Health AI)
  • Clinician communication. In order to deliver a high-quality of care and ensure continuity for the patient, clinicians must be able to easily exchange information on shared patients. Too often, this still takes place by fax or email. Today, integrated solutions can be embedded in EHR systems and also draw on provider-specific data to identify the best specialist for a given issue or to locate an available technician. (Examples: PerfectServe, TelMedIQ)
  • Population analytics solutions take in insurance claims information and other third-party data to enhance provider understanding of the health of the local population, forecast demand in relation to diseases and make investment decisions around staffing and care. (Examples: Appriss Health, BaseHealth, HealthEC, Philips Wellcentive)
  • Real-world evidence solutions attempt to carry clinical research beyond Phase III randomized controlled trials and look at patient outcomes after treatments go to market (more on this in the Life Sciences solutions below). From a provider’s perspective, independent research or a life sciences collaboration often translates into requirements about how caregivers enter data in EHRs. (Examples: Aetion, Precision Digital Health)
  • Clinical problem tracking and safety analytics solutions help providers monitor common challenges like infection and often take in data about patient interactions with caregivers and other patients while in the hospital. While providers have made substantial improvements, it is still the case today that about 1-in-25 hospital patients has a healthcare-associated infection. (Examples: Conifer Health, Evolent Health, Vigilanz)
  • Patient engagement and care management solutions are focused on medication adherence and targeted healthcare interventions. These solutions utilize clinical data, as well as socioeconomic and behavioral data. (Examples: CipherHealth, HealthLoop)

Finance and Operations:

  • Quality and compliance solutions draw on EHR and medical billing data to allow provider institutions to ensure that they are compliant with specific regulatory and payer requirements, with a focus on descriptive modeling and the ability to flag specific risks. (Examples: Conifer Health, Evolent Health)
  • Revenue cycle management solutions help providers manage billing, collections and accounts receivable. Given the complexity of handling patient co-pays, deductibles, insurance claims, reimbursements, and fraud detection, providers require specialized solutions, which draw on clinical and claims data. In some cases, these are largely workflow solutions with descriptive reporting, though many sophisticated providers also forecast patient-specific payment and reimbursement risk. (Examples: nThrive, Epic)
  • Pharmacy analytics draw on pharmacy data both within and external to the provider, and are used to manage pharmacy supply chains, make the dispensing process smoother and more patient-friendly, and increase patient adherence. (Examples: Vizient, McKesson, Sentry Data Systems, Amplicare)
  • General operational solutions like occupancy analytics (forecasting bed occupancy rates), cost management (comparing to industry benchmarks), marketing, performance tracking (tracking length of stay, readmissions, etc.) and supply chain analytics (optimizing hospital supply inventory) are all common and uniquely tailored to healthcare. (Examples: MedeAnalytics, Trilliant, KenSci, Vizient)

Life Sciences Analytics

Life sciences companies have always been data companies. As Dr. Luca Finelli, head of the Predictive Analytics & Design group within Global Drug Development at Novartis, recently said, “In reality, we are a data company…If we are able to bring our data into one place and tap into the latest computing technologies, we can generate new insights that in the past were difficult to obtain because our data was locked in silos.” Dr. Finelli’s view is one that is now common to leading life sciences companies, which are using their data to develop novel and exciting analytical approaches.


  • Drug candidate selection requires the collection, processing and analysis of enormous volumes of biological, chemical and clinical data to identify potential candidates. There is tremendous analytical horsepower (not to mention computing power!) currently focused on more efficient ways to understand potential causes of and treatments for disease. (Examples: BenevolentAI, BERG Health, Schrodinger)


  • Just as patient tracking is the foundational use case for providers, clinical trial result tracking is the backbone of the analytic work performed by life sciences companies. Through electronic data capture tools and clinical trial management systems, life sciences companies gather clinical data throughout the life of a trial that allows them to perform efficacy and safety analysis. (Examples: Comprehend, Medidata, Medrio)
  • Clinical trial optimization solutions are focused on a range of problems, from how to design and carry out adaptive trials to how to more efficiently recruit patients and improve site selection, monitoring and management. For patient recruitment, in particular, leading firms have begun to draw on socioeconomic and consumer data to build effective recruitment campaigns. (Examples: Acurian, Parexel, TriNetX)
  • Drug repurposing may occur when a drug does not show efficacy in its initial clinical trial, but upon further analysis does appear to be efficacious for a subpopulation that might share demographic, genetic or other traits. Approaches to cohort analysis have become much more sophisticated in recent years as life sciences companies find themselves with significantly more data to inform a potential repurposing. (Examples: Biovista, Excelra)
  • Real world evidence has already begun to change how healthcare players think about drug development. Randomized controlled trials remain the gold standard for evidence-based medicine, but the efficacy-effectiveness gap between how a treatment performs in a controlled trial and how it performs in the real-world is well-documented. Leading life sciences companies are drawing on EHR, CPOE, claims, socioeconomic and behavioral data to better understand patient outcomes under conditions of everyday clinical practice. (Examples: Cota Healthcare, Flatiron Health, Parexel, Tempus, Precision Health AI)

Sales and Marketing strategies in the life sciences are still catching up to those in other industries. Part of the gap is explained by the complexity in life sciences, including regulatory requirements around pharmaceuticals and the challenges of navigating formularies and reimbursement strategies. Part of the gap is explained by the wide margins enjoyed by life sciences that have not always incentivized innovation. Today, however, both in-house and third party teams are developing novel analytical approaches.

  • Physician targeting analytics have historically examined prescription volume in order to divide physicians into deciles and then allocate their sales reps’ efforts accordingly. These lists were often stale and, because the data was shared among competitors, there was limited opportunity to gain a competitive edge. Today, companies are offering more advanced analytics that — among other things — attempt to differentiate between influencers and refillers and identify signals of potential switching. In short, the move is from descriptive to predictive analytics. (Examples: Marketware, Veeva)
  • Rare disease patient finding. Locating patients is a challenge in many therapeutic areas, but it is particularly acute for rare disease companies. Leading companies are now applying machine learning to prescription and diagnostic data sets to help locate potential patients. (Examples: RDMD, Swoop, Syneos Health)
  • Distribution and sales analytics. A number of companies ingest both internal and external data to develop cost-effective and efficient drug distribution strategies, as well as to generate insights around the efficacy of their sales strategies. (Examples: Axtria, Inovalon, Komodo Health, Marketware, SHYFT)
  • Health Economics and Outcomes Research has been a standard function in biopharmaceutical companies for many years, drawing on clinical data to study drug utilization and outcomes in real-world settings, and combining these analyses with healthcare cost studies in order to inform access and pricing conversations with payers. The main factor that has changed in recent years is the sheer volume of real-world data that leading companies can draw on. (Examples: Cardinal Health, Syneos Health, XCenda, Precision Health AI)
  • Pharmacovigilance solutions collect data on adverse events experienced by patients taking pharmaceuticals, and can draw on physician and patient reports, as well as thorough analyses of clinical data. (Examples: Cognizant, Elsevier, Highpoint)

Payer Analytics

Data has always been core to payer’s business models, with actuarial science predating both modern financial theory and what we now broadly call “data science.” However, in healthcare, the set of questions faced by a payer are broader than they might be for other types of insurers, where analysis is primarily focused on probability of loss, cost of loss, and pricing competitive but profitable premiums. Today, leading payers function as both insurers and health advocates for those they cover, and data analytics are an essential component of their ability to improve patient outcomes.

Provider, Patient and Customer Solutions:

  • Risk management solutions focus on sharing financial risk between payers and providers in an effort to improve patient outcomes and contain costs (often included under the umbrella of value-based care). There are many types of risk management. Evidence-based medicine (EBM) standards provide an agreed upon framework between providers and payers that payers can use to assess provider’s relative performance. Increasing transparency around EBM also allows patients to make informed selections between different covered providers.
  • Pay-for-performance solutions tie provider reimbursement to the provider’s performance according to the EBM standards they’ve agreed to. (Examples: Inovalon, Milliman MedInsight, Verscend Technologies)
  • Disease or care management solutions focus on identifying covered patients who are most at risk of a poor health outcome and developing targeted interventions for those patients. This intervention could be as simple as a reminder to take a pill, or it could be as involved as a health and wellness program that looks at factors like diet and exercise. Such solutions may draw on EMR data, as well as prescription, claims, socioeconomic and behavioral data. (Examples: IBM Watson, Cognizant, Collective Health, Medecision, PopHealthCare)
  • Customer analytics solutions are focused on cost management for those who purchase insurance (in most cases, employers). This analysis may simply assess areas and levels of coverage, or the focus may be on helping a maturing company understand the healthcare cost impact of the shifting demographics of their employees. (Examples: Inovalon, Milliman MedInsight, Verscend Technologies, Truveris)

Finance and Operations:

Of the general operational solutions employed by payers, fraud detection — upcoding, false diagnoses, covering non-covered services, bribery and kickbacks — remains an area of significant focus. It is frequently estimated that between 3% and 10% of the United States’ over $3 trillion in annual healthcare spending is lost to fraud. The analytical challenge is worsened by the fact that false claims from a given provider are often spread across public and private payers. Actuarial analysis remains core to payer data analytics as does general operational monitoring and financial forecasting. (Examples: Experfy, IBM, Optum, SAS)


The Future of Health Analytics

Historically, the vast majority of health analytics investment has gone to sales and marketing use cases, with additional spending focused on finance and operations solutions for both payers and providers. A spend-weighted taxonomy would look something like this:

Over the next ten years, I predict both substantial growth in health analytics investment overall (uncontroversial), and — on a percentage basis — a relative shift toward drug discovery, development and patient-focused applications.

When it comes to the untapped societal and economic benefits of health data, I believe the most value lies in the following areas:

  1. Clinical trial optimization, which could both increase the likelihood of success in clinical trials and drastically reduce the time and cost of developing new drugs.
  2. Real-world evidence studies, which could lead to a much more sophisticated understanding of how patients respond to different treatments outside of the clinic (and also reduce the time and cost of the drug development process).
  3. Precision medicine, which can lead to patient care that is more preventive, predictive and personalized.

In other words, a spend-weighted taxonomy might look something like this 10 years from now:

These categories are high-value if solved, but require both more data and health analytics to solve than many of the lower-hanging commercialization questions. If I’m directionally correct, then the huge growth in health analytics could completely transform the healthcare system and deliver significant benefits for patients. There truly has never been a more exciting time to be working in health data and health analytics.


Special thanks to Bob Borek, who was the primary author of this post, and to Jacob Stern, who prepared early drafts. This taxonomy and our predictions were guided by discussions with Datavant clients, presenter lists at HIMSS and other conferences, and the excellent book Analytics in Healthcare and the Life Sciences. If you’re interested in continuing the conversation directly, please drop us a note here.

Disclosure: many of the example companies listed are clients of Datavant, and I have personal angel investments in several of the companies.

Editor’s note: This post has been updated on October 21, 2022 for accuracy and comprehensiveness.

The Health Analytics Ecosystem was originally published in Datavant on Medium, where people are continuing the conversation by highlighting and responding to this story.

Achieve your boldest ambitions

Explore how Datavant can be your health data logistics partner.

Contact us