Life sciences

Securing America’s Health Data with Privacy-Enhancing Technology

Author
Publish Date
Read Time
December 8, 2025
 min
Table of Contents

As digital health outpaces regulation, tokenization offers a scalable path to privacy and trust, enabling advancements that support regulatory, patient, and consumer priorities.

Healthcare organizations, life sciences innovators, and many others are all working toward building a health information ecosystem that weaves together patient data such as clinical data, insurance claims, and information from wearable devices into a unified, patient-centered network. These initiatives often aim to accelerate interoperability for the purpose of enabling breakthroughs in clinical care and research, or reducing healthcare costs. 

There are shared concerns that this rapidly expanding digital health landscape—which includes AI developers, large technology companies, third-party app developers, and other new and non-traditional health industry participants—is potentially outpacing the regulatory guardrails designed to protect it.

We’re confronted with a central policy challenge: how do we unlock the immense potential value of health data while addressing valid concerns over patient privacy and data misuse? 

The answer to this challenge requires pairing sound regulation with technical innovation

The first step is to integrate proven Privacy-Enhancing Technologies (PETs) such as tokenization into the nation’s health data infrastructure. In doing so, policymakers can align privacy goals with operational reality, ultimately creating systems that are secure, trustworthy, and interoperable. (Briefly: tokenization replaces direct patient identifiers with irreversible tokens so that records can be linked for care and research without exposing the underlying identifiers; more on this below.)

Where Current Rules Fall Short

The current policy conversation around health data privacy is driven by several known challenges that undermine consumer trust and create regulatory uncertainty:

  • The HIPAA Gap: The Health Insurance Portability and Accountability Act (HIPAA) provides a strong floor for protecting health information held by regulated entities, including “covered entities” (such as hospitals, health plans, and providers) and their “business associates.” However, once that data moves to a third-party health app, consumer device, or other non-HIPAA-regulated entities, it is generally no longer PHI governed by HIPAA. In such cases, other regimes may apply (e.g., FTC Health Breach Notification Rule / 16 C.F.R. § 318, state privacy laws). This leaves vast amounts of sensitive information—from mental health notes to reproductive health services—in a regulatory “wild west” where it can be used or sold with far fewer rules.
  • The Risk of Health Surveillance: The aggregation of sensitive health data creates significant risks of misuse that could undermine patient and consumer confidence. Experts warn of scenarios where insurers could use data to raise premiums, employers could make hiring decisions based on aggregated fertility data, or state prosecutors could leverage location-tagged health data in legal proceedings. These risks underscore the need for privacy-by-design technical safeguards rather than reactive policy measures, alone. 
  • Regulatory Fragmentation: The lack of a comprehensive federal privacy law has led to a patchwork of state-level regulations. This fragmentation creates untenable regulatory uncertainty for organizations, increases compliance costs, and results in inconsistent data protections for Americans depending on the state they reside in. It also creates ambiguity about when de-identified data are regulated and how cross-jurisdictional exchanges should be handled. Ultimately, this negatively influences patient perspectives and clouds the trust that their data are adequately and comprehensively safeguarded.

Privacy-by-Design with Tokenization and Privacy-Preserving Record Linkage

Privacy-Enhancing Technologies (PETs) offer a class of technical solutions designed to minimize data exposure while preserving its value for care, research, and public health. Among these, tokenization stands out as a proven and widely adopted method to support  privacy-preserving record linkage (PPRL), and is already in use across health systems, research networks, and regulated data exchanges.

What Tokenization Is and How It Works 

Tokenization is a process that replaces sensitive identifiers (such as names, dates of birth, or Social Security numbers) with encrypted, irreversible tokens. These tokens can be used to link patient records across systems without ever exposing the underlying direct identifiers (under HIPAA, these are identifiers of Protected Health Information, or PHI; outside HIPAA, similar identifiers are often referred to as personally identifiable information, or PII). In HIPAA terms, tokenization facilitates creation of data sets that can meet de-identification standards under 45 C.F.R. § 164.514 where appropriate.

This process creates site-specific tokens, meaning a single patient has a unique, encrypted token in each data holder's system. This is achieved by applying a site-specific encryption key to a "Master Token," which is itself created through an irreversible hash of the patient's direct identifiers. The software can be installed and run locally behind an organization's firewall, ensuring sensitive patient information never leaves its secure environment. When cross-organizational data linkage is needed, a controlled token transformation under bilateral approval with both audit logging and least-privilege access enables connection of patients across datasets.

How Tokenization Addresses Key Policy Risks

  • Enhances Data Control: Linking data between two organizations requires a token transformation process. This requires permission from both the data source and the recipient to transform their tokens for matching. This feature ensures data custodians remain in full control over who can access and link their data. Further, controls can be enforced contractually and technically (e.g., allow lists, time-bound credentials, and auditable logs).
  • Contains Security Breaches: Because each organization uses a unique encryption key, a security breach at one site does not compromise the tokens or direct identifiers/PHI at any other location. The affected site could simply be issued a new encryption key and re-tokenize its data, effectively eliminating the exposure from the breach. Per-site keys and key rotation provide both compartmentalization and rapid containment.
  • Enables HIPAA Compliance (and Beyond): Tokenization is specifically designed to enable the creation of de-identified datasets. It complements both HIPAA's prescriptive "Safe Harbor" method (45 C.F.R. § 164.514(b)(2)) and the "Expert Determination" method (§ 164.514(b)(1)), which allows for greater data utility in research while still meeting HIPAA's high standard for de-identification. Under HIPAA, data that have been de-identified pursuant to § 164.514(a) are not PHI. This means regulated entities can responsibly use de-identified data in ways that advance care, operations, and innovation while maintaining compliance with federal standards.

Datavant’s patented implementation operationalizes these principles by generating consistent tokens for the same individual, enabling high-quality record linkage across disparate systems (e.g., connecting a hospital encounter with subsequent pharmacy fills) without exposing direct identifiers.

In over 270 clinical trials, Datavant is tokenizing data—including Phase I and Phase II studies—enabling long-term follow-up, real‐world evidence, and regulatory applications without exposing PHI/PII.

Policy Actions: Operationalize Tokenization and PETs at the Federal Level

To build a digital health system that is both trusted and resilient, federal policy must establish clear expectations for the use of PETs.

  1. Incentivize and Require PETs in High-Risk Scenarios: Federal privacy legislation and government-led programs should require the use of PETs like tokenization whenever patient-level data is shared across different organizations, particularly in public-private partnerships and large-scale research studies. These technologies have already demonstrated value in safeguarding patient privacy in multi-institutional data collaborations. 
  2. Establish a Unified Federal De-identification Standard: Congress should codify a national de-identification standard based on HIPAA’s Expert Determination method. A unified federal standard would provide regulatory certainty for innovators who rely on de-identified data for critical research, while ensuring robust and consistent safeguards regardless of state residency. 
  3. Build a Foundation of Trust: Embedding privacy-preserving technologies into the national health data infrastructure is essential for restoring confidence among clinicians, researchers, and patients. Trust is not a byproduct of policy promises; it is a prerequisite for the kind of data sharing that drives innovation and improves health outcomes.

The goal is not simply to promise privacy, but to deliberately engineer it into the systems we build. U.S. policymakers should collaborate with technical experts and standards bodies to formally integrate tokenization and other proven PETs into federal health data frameworks. Implementation guidance (e.g., procurement criteria, grant conditions, and technical profiles) should specify required tokenization capabilities and controls. This will translate the nation’s privacy principles into enforceable, operational safeguards.

By doing so, we can ensure the United States builds a system that is secure, trustworthy, and prepared for the future.

Learn more about Datavant’s approach to maximizing data utility while protecting patient privacy.

Get connected to an expert

Looking to map the full patient journey, optimize commercial data spend, boost adherence, orreduce never starts?

Our experts partner with life sciences organizations to compliantly connect disparate datasets and unlock insights that:

  • Power more effective patient engagement
  • Enhance outreach to relevant providers
  • Guide decisions across the product lifecycle
  • Accelerate evidence generation

We'll tailor a session to your goals and explore how connected data drives better patient outcomes and stronger commercial performance for you.

Let’s talk about how to connect your data and unlock its full potential.

See all blogs

Achieve your boldest ambitions

Explore how Datavant can be your health data logistics partner.

Contact Us