We are excited to announce the launch of the Datavant Privacy Hub, one of the core components of Datavant’s Switchboard and the acquisition of Mirador Analytics, the leading provider of HIPAA expert determination services.
This is a huge step forward on our mission to connect the world’s health data to improve patient outcomes. Moreover, it will unlock a step change in how technology is applied to both preserve health data utility and protect patient privacy.
This post will explore the problems Datavant solves for customers and patients, what Privacy Hub is, why we acquired Mirador, and how we see the landscape of privacy-preserving technology evolving over time.
The State of Patient Privacy Protection in Health Data Exchange
Across the country, thousands of organizations are trying to bring together disparate de-identified health datasets for analytics. Whether sponsoring an oncology registry, building a model to find rare disease patients, or trying to understand the quality and cost of healthcare, one of the major challenges that organizations run into is ensuring health data is adequately de-identified to protect patient privacy.
Patient de-identification that preserves health data utility is an extremely complex process. For example — imagine you want to answer a relatively simple question: “What is the hospitalization rate of patients aged 60-70 who received the Moderna vaccine 6 months ago, stratified by comorbidities?” Answering this question requires bringing together data from several different sources (vaccination data, hospitalization data, comorbidity data). Doing so without including information that can be used to identify a patient requires statistical analysis in order to understand the uniqueness of different elements included in the dataset, and how they might be used in combination to re-identify a patient. Complexity increases further when you add elements like genetic markers that might be used to predict COVID-19 risk.
When considering the need to de-identify patient-level data, there is a tradeoff between data utility and patient privacy preservation. One way of visualizing this is an efficient frontier: as more data is removed, the remaining health data becomes less useful for analysis. Conversely, when data utility is increased, patient privacy risk goes up. The key thing to understand is that there is only a direct trade-off if we are actually on that efficient frontier.
In practice, I believe that it is possible to simultaneously improve patient privacy and health data utility — in short, we are nowhere close to the efficient frontier. The way we get there is with advanced, expert-informed technologies that ensure differing needs for patient privacy protection and research — like faster, more seamless de-identification processes, more analytics-friendly data, and complete patient privacy preservation — are not competing with each other, but rather, are being addressed together.
HIPAA and How De-Identification is Applied
Under HIPAA, most organizations today rely on what is called “expert determination” to legally de-identify patient data. Under HIPAA, a dataset can be considered adequately de-identified if an expert assesses the data to determine that there is a “very small” risk of re-identification. The benefit of the approach is that it is flexible and can support much of the valuable analytical work being performed across the healthcare system today. The downside is that it can be slow, with severe bottlenecks.
Furthermore, experts are often not equipped with the right technology to provide any required remediations, validations or ongoing monitoring. This results in delays and less data analysis that could be used to improve patient outcomes, as well as inconsistent protection of patient privacy across the healthcare industry
Privacy Hub and the Acquisition of Mirador Analytics
We believe there is a better way.
Today, it is hard to imagine health data being exchanged and connected across organizations at scale if there is a six-month delay for expert determination every time new health data are brought together. But if you fast forward into the future, patient privacy protection and disclosure risk assessments should work like this:
- Re-identification risk is assessed in real time using widely adopted tests and standards as new datasets are being configured and joined, allowing data scientists to assess the necessary trade-offs that will support robust analysis while protecting patient privacy.
- Once configured, the necessary identifiable elements are automatically removed or transformed to de-identify the patient data.
- For ongoing data feeds, automated monitoring is conducted on a continuous basis to ensure continued compliance with the original certification.
- Automated tests and standards and advanced technologies incorporate the learnings of all datasets e previously assessed and deemed as having “very low risk” of re-identification using the same methods.
- Patient privacy preserving technologies (such as homomorphic encryption and differential privacy) go well beyond redaction and hashing — you can read more about these approaches to patient privacy protection.
Privacy Hub will begin by streamlining and automating major components of the expert determination process, create more consistency in the standards applied to protect patient privacy, and shorten the time needed to obtain useful, connected data. Privacy Hub will be usable by any data source, data recipient, or independent expert.
Mirador Analytics, the leading HIPAA expert determination company, shares this belief. Their team of experts has consulted on hundreds of datasets for leading pharmaceutical, insurance and data analytics companies. Mirador is known for its rigorous approach to protecting patient privacy, while maximizing data utility to allow for innovation, efficiency, and development in healthcare.
We have partnered closely for years and now, with the launch of Privacy Hub, have come together around the shared vision of expert-shaped privacy-preserving technologies for the industry.
What the future holds
One of the biggest challenges for the health industry is how to connect health data across institutions safely to enable the use of health data at scale. To accomplish this goal, organizations will need to utilize multiple privacy-preserving approaches: redaction and hashing today, but ultimately frontier technologies as well, such as differential privacy, multi-party computing, and homomorphic encryption. These different technologies can be deployed as part of a holistic patient privacy toolkit, fit for the analysis and workflow at hand.
As these technologies continue to mature, they will offer new ways to protect patient privacy and enable the use of health data across the healthcare system. Datavant’s Privacy Hub will work with best-of-breed partners to bring the best technology available to protect patient privacy for a specific use case, whether doing analytics on top of a large synthetic dataset to inform drug development or analyzing clinical data in a fully-encrypted form.
In this future, everyone wins: the healthcare system gets smarter, patient privacy protections are strengthened and, most importantly, patient outcomes improve.
Editor’s note: This post has been updated on October 24, 2022 for accuracy and comprehensiveness.