Blog /

Evolution of Electronic Health Record (EHR) Data

Publish Date
Read Time
April 28, 2022

An interview with Stacey Long (Chief Strategy Officer, OMNY Health)

The volume of health data is growing exponentially. Even more notable is that patient data is becoming more readily available in privacy compliant formats for a variety of healthcare stakeholders to use in research, care improvement and reimbursement. In my blog post on healthcare data supply trends, I highlighted how electronic health record (EHR) data extracted from EHR software, disease registries and health systems represented the most common data type that partners brought to the Datavant ecosystem last year. To understand this trend, I interviewed Stacey Long, Chief Strategy Officer at OMNY Health, about the evolution she has seen in the EHR data landscape.

Stacey, you have had a long career in real-world data and analytics. Can you describe your background and experiences?

Thanks Su.  I’ve been working in the real-world data (RWD) data for more than 25 years, first as a health services researcher, and later building products and tools to support researchers from government, provider, payer, and life sciences organizations through my positions with Thomson Reuters, Truven Health Analytics, and IBM.  For the past year and half, I have focused on building out product strategy and services operations at OMNY Health. OMNY Health’s RWD-focused platform connects providers organizations, life sciences companies and patients for the purpose of sharing data and insights, and establishing collaborative research and quality improvement initiatives.

Can you level-set for our audience – what information can you get from EHR data?

EHR data consists of deep clinical content to support a diverse set of care, reimbursement and research initiatives.  EHR systems capture diagnoses, procedures, pharmacy orders and drug administrations relevant to specific patient encounters, as well as vitals (e.g. BMI, blood pressure, oxygen saturation), lab orders and results, care setting, and date of services delivered by the provider. Additionally, medical history, family history, details on the clinician delivering the care, and clinician notes describing other observations and rationale for decisions are often available to add clinical context to these data elements.

What do you think is driving the increasing availability of EHR data?

Foundational to this trend is the nearly universal adoption of EHRs by providers over the past decade, driven by federal incentives to adopt EHR systems and CMS quality reporting requirements. Beyond that, I believe there are two driving forces in the increased amount of EHR data entering the RWD landscape.  First, the acceptance of EHR-derived data to support evidence-based decisions has been growing both within provider organizations and across the broader healthcare ecosystem, most notably with the recent FDA guidance on using EHR data to support clinical trials and evidence generation.  Second, technological advances have made obtaining and gaining insights from EHR data easier than ever.  The consolidation of EHR vendors from hundreds to less than 20 groups has resulted in more uniformity and standardization across providers, more complete and accurate data, larger and more diverse patient populations, and more efficient data extraction. In addition, advancements in data science tools to mine the unstructured notes through natural language processing (NLP) and machine learning (ML) has opened a whole new set of analytics, increasing the robustness of insights derived from EHR data. Lastly, there are now privacy-preserving technologies that de-identify EHR data to maximize their utility while preserving patient privacy.

Beyond EHR software companies, health systems and specialty provider networks are becoming active collaborators in supporting research initiatives aimed at workflow efficiency, accelerating the development of new therapies, and improving patient outcomes. Health systems and specialty networks are making investments in the data they are generating to meet these goals, as well as utilize data for their own care management initiatives, quality improvement goals, and population health programs.

Where are you seeing EHR data be used the most? Which disease areas and use cases?

With the trend toward precision medicine, we are seeing requests for EHR data across all disease areas, but especially in therapeutic areas where disease severity and treatment effectiveness are measured through biometric changes in vitals, lab values, imaging, physician-reported severity scores, and more recently patient-reported outcomes (PRO).  We are seeing these measures used in dermatology, autoimmune disease, ophthalmology, orthopedics, respiratory, cardiometabolic disease, and oncology. This data is being used to help identify undiagnosed rare disease populations, improve diversity of clinical trials, and address healthcare inequity.

With the acceleration in data availability and technology capability, do you see new applications for EHR data in the future that may have an even bigger impact on patient outcomes?

Emerging use cases for EHR data have centered around improving clinical trial efficiency, such as identifying eligible patients faster or using RWD to create a synthetic control arm. EHR data is increasingly linked to data collected through clinical trials, resulting in hybrid clinical trial-real world datasets for deeper understanding of the patient journey outside the trial.  Additionally, they are informing pharmacovigilance and safety monitoring during care delivery to reduce provider reporting burden and offer more comprehensive reporting to manufacturers and regulatory bodies. For example, rather than rely on manual reporting systems such as FAERS or MAUDE, EHR data is being mined retrospectively for safety signals and long-term effectiveness and EHR software vendors are adding capabilities to report events during encounters. For this use case, it is important to have transparency in the data source and data curation methodology in order to meet FDA auditability requirements. The FDA emphasizes the importance of data transparency in their recent guidance on using EHR-sourced real-world data and registry data. Another emerging use case is the growing number of AI companies leveraging EHR data to develop predictive models to detect disease earlier or predict major clinical events. We are also seeing a growing trend in developing and integrating quality initiatives with EHR data and technology. Lastly, we are seeing a trend of EHR data linked to complementary data sets such as claims data which can contextualize the whole patient journey.

There are a lot of EHR data providers in the landscape — EHR software companies, disease registries that extract data from many EHRs, or research networks of health systems. What are some considerations for buyers when choosing an EHR data provider?

The right EHR data source really depends on the intended use case, which determines the data variables needed. As an industry, we are fortunate to have an increasing number of options for RWD sources.  Some of the evaluation criteria for EHR sources include:

  • Availability of the data variables needed for the analysis
  • Representativeness of the source to capture the relevant care providers and their treated patient population
  • Patient population size and whether it meets statistical significance for the study  
  • Completeness and longitudinality of the data as it relates to the intended use case
  • Cleanliness and degree of normalization applied to the data
  • Auditability of source data as needed for regulatory use cases
  • Ease of contracting and cost to procure the data

Some general purpose EHR systems are used by clinicians treating patients across a wide spectrum of inpatient or ambulatory settings, while other EHR systems capture elements specific to specialty areas. Specialty EHRs tend to have more depth of data in structured format relevant for specific diseases such as in oncology, cardiovascular, behavioral health, and dermatology. Academic Medical Centers or Specialty Hospitals (e.g. Children’s hospitals, VA hospitals) may attract certain types of patients and treating providers so EHR data from these care settings will reflect that patient and provider profile. EHR-derived registries usually capture data on patients with a specific disease and may have limited data fields unless manual abstraction is applied to capture more data elements in structured format.  It is important to do your homework and evaluate the criteria listed above when considering different EHR data providers.

There are also trade-offs when considering whether to work with an EHR data originator that is closest to the point of care versus an EHR data aggregator. Data providers that are closest to the point of care may be able to provide source verification and auditability, which is needed for regulatory use cases. However, these sources may require data cleaning and standardization, which adds extra work for the researcher. Data aggregators with curated data sets can make the researcher’s job easier in terms of standardization of the data, although heavily curated data may limit some of the functionality and ability to detect differences across population groups.

We built the OMNY Health platform of curated research-ready de-identified clinical data sourced directly from a diverse set of provider organizations across the United States to make the process of EHR data selection and procurement both flexible and efficient from a contracting perspective. We designed our business operations and data models to address some of the considerations I mentioned above. For example, we work with specialty networks to pull out data that capture additional depth in disease-specific scores and measures that are not generally available. Our health system and specialty provider networks are active partners with OMNY to capture data to support a diverse set of retrospective and prospective research studies, as well as participate in quality initiatives.

How do you think about the use of unstructured data from EHRs? What is hindering further use of unstructured data which can provide a lot of research value?

Unstructured health data in EHR systems is a gold-mine of information to understand the ‘why’ of a patient’s diagnosis and treatment. Clinical notes capture the qualitative perspective from the patient, as well as the rationale behind health provider decisions in treatment selection and treatment changes. Common requests we receive are to understand reasons for changes in therapy, including dosing or therapy choice, documentation of genomic biomarkers, and availability of specific PRO measures.  It is challenging to extract this information at-scale today, although the advancement in natural language processing (NLP) capabilities is helping to increase the usability of this information.  Once extracted, clinical notes are transformed into structured or semi-structured fields that are analysis-ready, and personal health information is removed before augmenting the transformed data with the structured data fields of the EHR.

As an industry, we have advanced quickly with NLP and ML to mine unstructured notes. This activity is scaling but we’re also moving beyond unstructured text to other sources of unstructured data, such as images. We’re starting to address the challenges of de-identifying and consuming imaging data so that they are analytically meaningful.

What are you most excited for as the EHR data landscape continues to grow and evolve?

It’s exciting to see the growth of new data sources that can be used to connect the dots along the patient journey and outcomes.  Linkages of EHR data to other data streams such as claims, registries, social determinants of health (SDOH) data, and now Internet of Things (IOT) data streams through tokenization is opening up new use cases as we strive to build a comprehensive picture of patient care and health outcomes.  It is also exciting to contribute toward building out new data sources which incorporate information from under-represented patient populations to understand and address health inequity. I’m a firm believer that collaboration across the broader ecosystem will drive more evidence-based decisions and improve patient lives.

Thank you for sharing your insights with me, Stacey!

If you would like to learn more, email Stacey Long at or Su Huang at  Special thanks to Stella Chang (OMNY Health) and Elenee Argentinis (Datavant) for their review of this post.

Editor’s note: This post has been updated on October 19, 2022 for accuracy and comprehensiveness.

Achieve your boldest ambitions

Explore how Datavant can be your health data logistics partner.

Contact us