In our Data & Analytics Thought Leader Series, Datavant’s Head of Data Strategy, Su Huang, interviews leaders who are responsible for defining the teams and processes for managing data and advancing data-driven use cases at their organization. Today’s interview is with Anthony Philippakis, Chief Data Officer at the Broad Institute.
Anthony is committed to bringing genome sequencing and data science into clinical practice. He started his career as a cardiologist at Brigham and Women’s Hospital. Motivated by a desire to build scalable change in healthcare, he moved into technology, first as a product manager at the Broad Institute and later becoming Chief Data Officer (CDO). As CDO, Anthony built the 250-person Data Sciences Platform team to manage data at scale and fuel the next wave of discoveries in the biomedical research community. Currently, Anthony co-directs the Eric and Wendy Schmidt Center (EWSC), which is focused on advancing research at the interface of machine learning and biomedicine. Anthony also builds and invests in new companies as a Venture Partner at GV.
Su: Anthony, thank you so much for joining us for this interview series with data and analytics leaders in healthcare. You have an amazing array of experiences from being a physician by training to venture investing to building a software organization at Broad. Can you give us an overview of the Broad Institute and your remit there as Chief Data Officer?s. Can you describe your background and experiences?
Anthony: The Broad Institute of MIT and Harvard was formed in the wake of the human genome project, which was a time of significant change in biological research on two fronts. First, there was interest in large-scale, systematic approaches to biology. Second, there was appreciation that bringing people with diverse skillsets together could be transformative.
The Broad Institute embodies these two philosophies — (1) taking large-scale, systematic approaches to biology and (2) bringing together diverse research teams to execute on them.
My current focus as Chief Data Officer (CDO) is building the Eric and Wendy Schmidt Center (EWSC) in partnership with Caroline Uhler (a professor at MIT and leader of machine learning) and leading a team of data scientists to advance machine learning applications for medicine and biology. In addition to that, I am a member of the Institute’s Executive Leadership Team and continue to work closely with the Data Sciences Platform (DSP), currently led by Clare Bernard.
Su: Can you describe more about the DSP and the EWSC at Broad? What is the mission and vision for each?
Anthony: The Broad Institute recognizes that the greatest transformation occurs when you can tackle challenges that are both intellectually difficult and operationally difficult. I love that.
TheLife Sciences are in the midst of a data revolution. It is time for new approaches to making biology a data science, as well as ways to operationalize and disseminate ideas at scale. At Broad, we focus on both goals.
The EWSC centers around research and innovation at the interface of biomedicine and machine learning (ML). This is a big field and our focus is on taking the most important questions of biology and using them to drive the next generation of foundational advances in ML. This is different than the typical process of “bringing ML into biology.” There is reason to believe that biology can drive ML in new directions. For instance, in biology, we can conduct perturbations on a larger scale than in most fields. Similarly, we are less concerned with achieving state of the art on a benchmark dataset, but more interested in mechanism. Both of these elevate questions of causal inference to the forefront, which has not been as central to modern ML.
The DSP centers around building a scalable platform called Terra, in conjunction with Microsoft and Verily, that enables researchers to store, share and analyze genomic and clinical data at scale. The mission is to build a software platform that spans the lifecycle of biomedical data. As a modern cloud-based software platform, Terra brings different user personas together and creates value from their interactions. In particular, we are using Terra to:
Su: Very interesting — I’ll come back to Terra later. I would love to know how you think about “enterprise data management”, which is the theme of this interview series. To me, enterprise data management represents the processes that an organization undertakes to put data, which may sit across many silos and have disparate rules for use, into a unified infrastructure with a standard process to unlock insights that inform business decisions. Data includes both primary data generated directly by your organization as well as data that Broad has access to via partnerships or licensing. What else do you think about as it relates to “enterprise data management” in your role as data leader for the Broad Institute?
Anthony: That’s a great way to define the basic challenge. I’ll add that for what we do, the scope is much larger – we’re not just looking at data management for the Broad and immediate partners; we’re working to implement data management infrastructure and processes that will serve the needs of the global biomedical research community. Organizations themselves can act as data silos. In particular, many research organizations have data that would be more valuable if it could be federated across those organizations, so we are building solutions to do just that. This benefits the participating organizations since they will be able to gain more insights from their own data. This also benefits patients and study participants who donated the data to begin with.
Su: How does this level of federation help unlock more insights out of the data?
Anthony: Federated learning enables ML models trained on distributed datasets. Two primary use cases are:
Federated learning allows such insights to be unlocked, even while the data remains distributed.
Su: How is Broad using federated learning in the Terra platform?
Anthony: We inverted the traditional model of data sharing: instead of having research organizations download copies of the data to their respective silos, you put the data on the cloud, with mechanisms for researchers to access and compute on it in-place. In partnership with Microsoft and Verily, we built the Terra platform to support secure data storage, data sharing and collaborative analyses with built-in tools and interfaces that are tailored forLife Sciences researchers.
Su: There are a lot of data platforms out there, so what differentiates Terra?
Anthony: We have three strategic pillars:
Third, we are building a federated data ecosystem, not a walled garden. This shows up in a few ways. We build things that are modular and not monolithic. We believe that software should be community-driven. We lean into standards such as GA4GH (Global Alliance for Genomics and Health), which Broad helped found. We are open source, have open APIs, and are committed to sharing data.
Su: How do you reconcile this drive for openness with the need for data security, privacy and compliance?
Anthony: At the platform level, we build to extremely high standards of information security — Terra is rated FedRAMP Moderate, which enables us to store sensitive data from federal projects, such as All of Us. This level of security is crucial for groups handling health data.
Beyond that, we seek to develop technologies that facilitate compliance. One example is an effort called “DUOS” (data use oversight system).Life Sciences data within Terra has two axes of access control — one based on who you are and one based on intended use for the data. Who you are is computable, since it is based on IAM (identity and access management) systems that are widely utilized. However, intended use is not currently computable. We changed that by building an ontology that summarizes research purposes, so that they can be computed. When we first took this to our IRB, they said “you can’t automate my job!” We actually ran a trial that was published recently in Cell Genomics showing that the IRB’s opinions and DUOS match up very closely!
Su: Terra is solving the problem of giving researchers access to data. What next? What’s holding the industry back from unlocking more insights from massive amounts of genomics data? Is it technology-based factors such as computing power? Is it people-based factors such as expertise with genomics?
Anthony: It’s a mix of both — scaling genomics analysis traditionally requires a lot of computing power and specialized engineering knowledge. With Terra, we can help with both. We use the cloud to make scalable computing power available to all, and we provide pre-built tools and interfaces to allow researchers to use the data without needing special training. However, there is still a lot of work to be done in terms of algorithm development to achieve next-level insights.
Su: Everything you describe here sounds extremely cross-disciplinary. In past interviews, you’ve spoken about your dream to see a health IT company where the CTO was previously the CTO of Angry Birds, the Chief Medical Officer was a practicing physician who also knows programming, and the CEO was from ad-tech — essentially this idea that Silicon Valley and the healthcare community needs to cross-pollinate so that tech and healthcare can learn from each other. How have you tried to institutionalize this vision in the team you’ve built at Broad?
Anthony: That’s exactly what we’ve done with the Data Sciences Platform. We intentionally structured it like a software development organization, with deep connections to research teams at Broad and beyond, so that all relevant expertise is immediately available to cross-pollinate ideas. I hope that we can replicate this with the Eric and Wendy Schmidt Center to turn foundational discoveries in machine learning into production grade tools.
Su: Last question — if you had unlimited resources and could accelerate one area of what you are overseeing at the Broad Institute, what would that one area of investment be?
Anthony: I am very passionate about improving the state of common disease drug development. Drug development is focused almost exclusively on rare diseases and cancer. There are actually few organizations seeking to develop drugs for the top 10 causes of death! For a long time that was because we lacked good targets, but human genetics has really changed that. The challenge now is the high cost of common disease clinical trials. Running a Phase 3 trial for a disease like coronary artery disease can cost more than $1b and take 5 years because of the high number of patients you need to recruit and the long time period needed to have events for the primary endpoint of the trial.
However, there is a new generation of ML tools based on genomic and clinical data that can predict both who will be more likely have events, and who will be more likely to respond to therapies. This would allow for smaller trials. I am excited about working to develop these ML instruments through the Eric and Wendy Schmidt Center, and potentially accelerate common disease drug trials.
Su: That would be a worthy cause indeed. Anthony, thank you so much for sharing these insights!
If you would like to learn more, please email Su Huang at su@datavant.com.
AnalyticsIQ, a marketing data and analytics company, recently adopted Datavant’s state de-identification process to enhance the privacy of its SDOH datasets. By undergoing this privacy analysis prior to linking its data with other datasets, AnalyticsIQ has taken an extra step that could contribute to a more efficient Expert Determination (which is required when its data is linked with others in Datavant’s ecosystem).
AnalyticsIQ’s decision to adopt state de-identification standards underscores the importance of privacy in the data ecosystem. By addressing privacy challenges head-on, AnalyticsIQ and similar partners are poised to lead clinical research forward, providing datasets that are not only compliant with privacy requirements, but also ready for seamless integration into larger datasets.
"Stakeholders across the industry are seeking swift, secure access to high-quality, privacy-compliant SDOH data to drive efficiencies and improve patient outcomes,” says Christine Lee, head of health strategy and partnerships at AnalyticsIQ.
“By collaborating with Datavant to proactively perform state de-identification and Expert Determination on our consumer dataset, we help minimize potentially time-consuming steps upfront and enable partners to leverage actionable insights when they need them most. This approach underscores our commitment to supporting healthcare innovation while upholding the highest standards of privacy and compliance."
As the regulatory landscape continues to evolve, Datavant’s state de-identification product offers an innovative tool for privacy officers and data custodians alike. By addressing both state-specific and HIPAA requirements, companies can stay ahead of regulatory demands and build trust across data partners and end-users. For life sciences organizations, this can lead to faster, more reliable access to the datasets they need to drive research and innovation while supporting high privacy standards.
As life sciences companies increasingly rely on SDOH data to drive insights, the need for privacy-preserving solutions grows. Data ecosystems like Datavant’s, which link real-world datasets while safeguarding privacy, are critical to driving innovation in healthcare. By integrating state de-identified SDOH data, life sciences can gain a more comprehensive view of patient populations, uncover social factors that impact health outcomes, and ultimately guide clinical research that improves health.
Both payers and providers are increasingly utilizing SDOH data to enhance care delivery and improve health equity. By incorporating SDOH data into their strategies, both groups aim to deliver more personalized care, address disparities, and better understand the social factors affecting patient outcomes.
Payers increasingly leverage SDOH data to meet health equity requirements and enhance care delivery:
Payers’ consideration of SDOH underscores their commitment to improving health equity, delivering targeted care, and addressing disparities for vulnerable populations.
Capital District Physicians’ Health Plan (CDPHP) incorporated SDOH, partnering with Papa, to combat loneliness and isolation in older adults, families, and other vulnerable populations. CDPHP aimed to address:
By integrating SDOH data, CDPHP enhanced their services to deliver comprehensive care for its Medicare Advantage members.
Value-based care organizations face challenges in fully understanding their patient panels. SDOH data significantly assists providers to address these challenges and improve patient care. Here are some examples of how:
By leveraging SDOH data, providers gain a more comprehensive understanding of their patient population, leading to more targeted and personalized care interventions.
While accessing SDOH data offers significant advantages, challenges can arise from:
To overcome these challenges, providers must have robust data integration strategies, standardization efforts, and access to health data ecosystems to ensure comprehensive and timely access to SDOH data.
With Datavant, healthcare organizations are securely accessing SDOH data, and further enhancing the efficiency of their datasets through state de-identification capabilities - empowering stakeholders across the industry to make data-driven decisions that drive care forward.
Key takeaway: As the volume of trials that Datavant tokenizes continues to grow, a key observation is that sponsors that integrate privacy-preserving linkage solutions early are the ones best-positioned to accelerate research, optimize commercial strategies, and ultimately advance patient care.
As trial tokenization scales across clinical development, it is evolving from a data privacy tool into a strategic asset that enhances trial design, regulatory and payor submissions, and long-term evidence generation. Sponsors that embed tokenization early in trial planning are better positioned to unlock deeper insights, drive innovation, and improve patient outcomes.
Explore how Datavant can be your health data logistics partner.
Contact us