As the amount of health data in the world increases, so do the privacy needs that must be addressed in order to generate insights from it. However, rapidly evolving rules complicate the compliance process, making world-class privacy expertise essential to maximizing data utility.
Privacy Hub by Datavant has the largest team of statisticians and data scientists, who have in combination performed more than one thousand HIPAA Expert Determinations.
In this series of interviews, our industry-leading Privacy Experts expound on what their experience with HIPAA Expert Determination has taught them about compliance and the future of health data privacy as a whole.
Anca Ionescu, Senior Data Scientist and Privacy Expert at Privacy Hub, marks the first interview of the series.
An Expert’s Path to Privacy
What is your health data privacy background, and what attracted you to the field?
Introduction to HIPAA Expert Determination
My privacy experience started at Mirador Analytics, one of the companies composed of experts in statistical disclosure risk analysis that is now part of Privacy Hub by Datavant. I come from a statistics background, specializing in Statistics at the college-level and finishing a Masters in Data Science at Lancaster University. Towards the latter end of my Masters, Mirador was partnering with my university, and the company advertised a project in which it explained its focus. Having awareness of GDPR [the EU/UK privacy standard] and having done GDPR courses with previous employers, I found Mirador’s work with HIPAA (Health Insurance Portability and Accountability Act) compliance very interesting. I found it stimulating, how we can use reference data and other information to come up with a method to quantify the risk of a dataset, which is a bit different from the more used k-anonymity approach [a data anonymization technique that involves replacing Personally Identifiable Information (PII) with a pseudonym, as well as data generalization and masking to ensure no single individual can be identified].
Essentially, I found it interesting how you can use reasonably available information to re-identify someone. Basically, you can be a detective, asking yourself, “If I already have X information about someone, what else is available online or what else can I dig into to re-identify?” Obviously, we privacy experts do it for the purpose of limiting the risk of re-identification, but I found the whole process quite stimulating. Also, with me coming from a statistics background and the quantification of risk done in Expert Determinations all coming from statistical methods, Mirador Analytics [now Privacy Hub] was a great fit for me.
Training Under Privacy Expert With Decades-Long Experience
I started off as an intern at Mirador under the mentorship of Colin Moffatt, Privacy Hub’s Chief Data Scientist and Privacy Expert with over 20 years’ experience in statistics and data analysis. Initially, I had to write my Masters dissertation as part of my internship, and I was looking at the other end of the spectrum: quantifying utility in the context of HIPAA Expert Determination. So I was analyzing datasets to examine what were the risk-mitigating strategies that yield the highest utility. When I finished my internship and was offered a full-time position, I started looking at privacy.
When I started working at Mirador three years ago, one of the main challenges we experienced was that there was very little research done into quantifying the utility of health data and even less of that when it was in the context of HIPAA. Beyond that, health data is different from other sources of data in that it contains both numeric information and also categorical variables and free text, so it’s quite hard to quantify utility when combining all those elements. Finding data is an additional challenge, given the fact that health data is protected and therefore can’t be found online as easily as other types of data.
Working in Privacy Hub’s HIPAA Expert Determination Team
Today, I am a Senior Data Scientist and Privacy Expert here at Privacy Hub. What does that mean? We try to foster internal knowledge and to keep the whole team at the same level,continuously improving methods and expertise. We also liaise with clients, overseeing a lot of projects, and do internal research for the purpose of designing innovative privacy solutions. And, of course, we do Expert Determinations.
From my years of experience, one of my main takeaways is that there are a very wide variety of different use cases, and some of them are incredibly interesting. There are so many purposes out there for health data, and there are so many ways it can help patients. For instance, working with wearable technology, like Smart Watches. I thought that was quite interesting, as I didn’t realize at that time that that was also health data. Other examples were survival analysis or just seeing how well a drug is performing.
Privacy Hub’s Value
How do you see Privacy Hub by Datavant’s offerings being unique from the rest of the market?
Privacy Hub’s Data Science Team and Internal Research Function
I think what differentiates us is that we’re always working on improving our solutions by trying to use the most up-to-date data, the most cutting-edge technology, and the best statistical methods. The fact that we have so many focus groups (clusters of data scientists dedicated to internal research in specific areas) is a testament to how we prioritize not only the protection of patient privacy but also the optimization of the compliance process. We’re always taking into account new factors that may affect risk of re-identification in the future. For instance, we’re always thinking to ourselves, “Will there be more reasonably available information in the next few years? Will there be more technologies that can re-identify people in the next decade?”
Beyond that, we definitely have a great Data Science team. It’s composed of the smartest individuals who come from a variety of backgrounds. They have a wide range of academic and employment experiences: Our team covers expertise in Statistics, Data Science, Chemistry, Physics, Mathematics, so it’s very useful to have these areas of knowledge come together when we collaborate. For instance, teammates that have extensive backgrounds in computing and technical subjects are excellent at coding, so they share those skills with the rest; in that same vein, some people have more knowledge of statistics or genetic data, and that’s what they contribute to team education. Having people come from so many different backgrounds definitely helps cultivate internal knowledge.
Tailoring HIPAA Expert Determination to Different Use Cases and Data Types
The diversity of expertise within our team also informs our ability to use a range of privacy-preserving techniques based on different circumstances and use cases. We change our approach based on different data. For instance, we have a method that allows us to see the risk that is still persistent within synthetic data that was created from real health data. We also have ways to assess the risk within genomics data and unstructured data, and we of course have our statistical method for a standard Expert Determination, which is for tabular data (i.e. regular numerical data).
We can obviously tailor our approach to any use case. Let’s say that a client comes to us and says, “We’re really interested in granular patient geography,” we might be able to customize our mitigating strategies so we can allow them to have that information. Similarly, if they want to do a survival analysis, then the death date will be very important to them, so we would work to find a way to minimize risk while still keeping that specific data point.
For organizations that are new to patient privacy, I would definitely suggest speaking to a Privacy Expert. In most cases, if you have a question about whether or not HIPAA applies in a specific context, it more likely does.
A Look Ahead
Which areas of health data privacy do you foresee expanding or evolving?
Unlocking the Value of Unstructured Data through HIPAA Expert Determination
A current area that I see expanding within the health data privacy sphere is unstructured data, as most of the world’s health data is unstructured. Although tabular data is more commonly used for data linkage and analyses, it constitutes a much smaller percentage of the health data in the world. There are boundless opportunities that can be unlocked by unstructured data, such as the fact that it allows customers to have the clinical context necessary to better understand the patient journey. However, adding a new modality means that we have to calculate risk between structured and unstructured elements, increasing the time required to do a proper review. Thankfully, with newer technology, like artificial intelligence, we are making unstructured health data analysis and compliance more available. This technology makes it possible to transform unstructured data into structured data, but also enables us to more easily and effectively de-identify unstructured data in its current format. Exploring the information within unstructured data will definitely advance research.
The Future of HIPAA Expert Determination
Aside from that, I see people being more careful. The more you know, the more cautious you become. With everything I’ve learned, I myself am more thoughtful about what I consent to, and I see people being more protective of their health data in the future.
I also think that as data linkage increases, additional investment in privacy will be inevitable. The more data we link, the higher the chances are that we’ll end up with datasets with risky information. So, when linking datasets, it is important to bear in mind that even if you de-identified them separately, you still have to run an Expert Determination on the combination.
Overall, for health data privacy and Expert Determinations, I know there’s a lot more coming in the future, because we’re producing more and more health data as we speak. In addition, the public is becoming more aware of the risks of not having their data protected, so I only see this field developing further.
Privacy Hub by Datavant
Privacy Hub by Datavant is the leader in privacy preservation of health data, offering independent HIPAA de-identification analyses and Expert Determinations as well as advanced technologies and solutions to improve the quality, speed, and verifiability of the compliance process.
Learn how Privacy Hub can address all your evolving privacy needs.
HIPAA Privacy Expert, Patrick Baier, explores how experts can focus on higher-levels of HIPAA Expert Determination and facilitate innovation.
Senior Data Scientist, Fran Lane, discusses how Privacy Hub continuously moves toward a holistic solution to preserve health data privacy.