How AI Is Powering Smarter, Privacy-Preserving Data Discovery for Life Sciences
Life sciences organizations have long struggled with fragmented, time-intensive processes for discovering, assessing, and purchasing real-world data (RWD). These bottlenecks delay evidence generation, slow clinical and commercial decision-making, and increase the cost of innovation.
Over the past year, leading organizations have taken meaningful steps to address this challenge. A major milestone was the launch of Datavant Connect powered by AWS Clean Rooms, a cloud-first solution developed through collaboration between Datavant, AWS, and nearly 20 partners across the pharmaceutical and real-world data ecosystem.
Now, Datavant and AWS are building on this foundation with Explore Assistant, an AI-powered capability that brings intelligence, speed, and accessibility directly to the data discovery experience, while preserving privacy and governance.
Explore Assistant was previewed at AWS re:Invent 2025; watch the session recording here.
The Real-World Data Discovery Problem
Despite the growing importance of real-world evidence, data discovery remains inefficient for both data buyers and data sources.
For data buyers, common challenges include slow, manual workflows; uncertain feasibility; and high cost and wasted effort. It can take months or even years to identify suitable data, and up to 40% of data purchases turn out to be incomplete or misaligned with study needs.1
For data sources, the challenge lies in showcasing the richness of their data while maintaining strict control, privacy, and governance. Supporting repeated, bespoke feasibility requests consumes significant resources yet often fails to translate into commercial outcomes.
Explore Assistant addresses challenges on both sides of the health data ecosystem.
- For data buyers, it enables deeper feasibility earlier, allowing teams to determine whether a dataset meets clinical or commercial needs before purchase.
- For data sources, it enables passive, privacy-safe discoverability, which reduces pre-sales burden while driving higher-quality inbound demand.
Meet Explore Assistant
Explore Assistant is an agentic AI capability within Datavant Connect powered by AWS Clean Rooms, designed to make deep data feasibility assessments more accessible and intuitive.
Built for non-technical users, Explore Assistant allows teams to ask natural-language questions that are interpreted and routed to specialized agents for metadata analysis, query construction, and clinical definition generation. Leveraging large language models through AWS Bedrock, the system translates questions into analytically rigorous feasibility assessments.
All outputs are guided by the Structured Process for Identification of Fit-for-Purpose Data 2 (SPIFD) framework.
The Secure Foundation Behind AI-Driven Data Discovery
Explore Assistant is built on a privacy-first foundation combining Datavant’s tokenization with AWS Clean Rooms. With tokenization, sensitive identifiers are replaced with non-deterministic, irreversible tokens, allowing secure matching across datasets and creating a more complete longitudinal view while maintaining strict privacy standards.
AWS Clean Rooms
AWS Clean Rooms is an analytics service that helps companies and their partners to more easily and securely analyze and collaborate on their collective datasets, without sharing or copying each other’s underlying data. Using AWS Clean Rooms, you can create a secure data clean room and collaborate with any other company on AWS to generate unique insights that help inform clinical or commercial use cases for life science companies. Datavant Connect Powered by AWS Clean Rooms enables secure discovery and evaluation, as no underlying data moves from each data owner’s cloud environment, while enabling scale for large and complex data sets.
Benefits of Explore Assistant for data sources and data buyers include:
1. Deeper Feasibility for Data Buyers: Explore Assistant allows data buyers to evaluate data suitability before purchase, without sacrificing rigor or privacy.

By asking natural-language questions, buyers can assess whether a dataset contains the required data elements, clinical characteristics, or cohort size needed for a specific use case. When metadata alone is sufficient, Explore Assistant returns immediate answers. When deeper analysis is required, it automatically constructs feasibility or cohort queries and executes them securely within AWS Clean Rooms.

This replaces weeks of manual back-and-forth with multiple data providers, compressing a traditionally lengthy, email-driven discovery process into minutes, and giving buyers greater confidence in their data purchasing decisions.
2. Enhanced Discoverability for Data Sources: Explore Assistant also transforms how data sources bring their assets to market.
By registering tokenized datasets within their own AWS environment, data sources become passively discoverable to potential buyers—without moving or exposing underlying data. Explore Assistant surfaces available data tables, ownership, table types, and permissions, allowing buyers to evaluate options securely and efficiently.

This increases the commercial value of data assets while improving margins. Data sources benefit from reduced feasibility backlogs, fewer bespoke evaluations, and increased inbound demand. Aggregated insights into buyer question patterns further help sources understand market interest—while built-in guardrails ensure all queries remain safe, limited, and compliant.
3. Privacy, Governance, and Trust: All Explore Assistant analyses are executed within AWS Clean Rooms and governed by strict controls that prevent re-identification risk.
For questions spanning multiple data sources, Explore Assistant initiates secure, multi-party clean room collaborations, preserving each contributor’s data controls while enabling cross-dataset insights.
A Faster, Smarter Way to Discover Real-World Data
Explore Assistant redefines what’s possible in real-world data discovery. By uniting AI-driven intelligence with cloud-first privacy and secure collaboration, Datavant and AWS are changing how life sciences companies find the right data.
Whether you’re looking to make your data discoverable or accelerate data discovery for your next project, Explore Assistant can help.
Disclaimer: Explore Assistant utilizes generative AI to support data discovery. All outputs should be reviewed by a subject matter expert for accuracy.
Contact your Datavant or AWS representative, or get in touch with our team to learn more.
References
- Based on a survey done by Datavant done with Life Science customers. The information was corroborated from 13 leaders from Life Sciences & Biotech who participated in AWS HCLS Data Forum held in June and Aug 2023
- Gatto, N. M., Vititoe, S. E., Rubinstein, E., Reynolds, R. F., & Campbell, U. B. (2023). A Structured Process to Identify Fit-for-Purpose Study Design and Data to Generate Valid and Transparent Real-World Evidence for Regulatory Uses. Clinical pharmacology and therapeutics, 113(6), 1235–1239. https://doi.org/10.1002/cpt.2883

