Blog /

Data & Analytics Thought Leader: Paul Petraro, Boehringer Ingelheim

Publish Date
Read Time
August 11, 2022
Paul Petraro, Executive Director and Global Head of the Real World Evidence Analytic (RWE) Center of Excellence at Boehringer Ingelheim

In our Data & Analytics Thought Leader Series, Datavant’s Head of Data Strategy, Su Huang, interviews leaders who are responsible for defining the teams and processes for managing data and advancing data-driven use cases at their organization. Today’s interview is with Paul Petraro, Executive Director and Global Head of the Real World Evidence Analytic (RWE) Center of Excellence at Boehringer Ingelheim.

With a doctorate in epidemiology, Paul’s career has taken him from academia to global public health development, from analytics roles to senior leadership positions in HEOR and pharmacoepidemiology at some of the largest life sciences companies. For the last three years, Paul has been at Boehringer Ingelheim (BI) as Executive Director and Global Head of the Real World Evidence Analytic Center of Excellence (RWE COE). In this role, Paul is responsible for creating a centralized data strategy under his COE to support Medicine, HEOR and other functions. His team is also charged with developing real world data (RWD) competency that spans the clinical development lifecycle, crossing different therapeutic areas and global regions.

Su:  Paul, welcome to this interview series with data and analytics leaders in healthcare! Your diverse background positioned you well to lead the RWE Center of Excellence (COE). It’d be great to learn more about this COE concept at Boehringer Ingelheim (BI). How are the different COEs organized and what is the company’s goal in setting up these COEs?

Paul: I believe that BI views COEs as bringing expertise to the rest of the organization. For the company to achieve its goals in using RWE, the organization needs experts in the RWE space, so the COE builds that competency. Whether it’s HEOR or clinical development, and regardless of therapeutic area or focus, we want those business or functional leaders to first think: Let me consult the COE for their expertise.

The other benefit is centralization. For our RWE COE, we bring together real world data (RWD) in a central place so that the entire organization can leverage that resource. Historically, you have seen multiple departments within a pharma company licensing the same assets to work on similar projects.

My team makes the data accessible in a compliant way, accounting for legal, regulatory, and governance rules. It doesn’t matter where you sit in the organization, whether that’s commercial, clinical development, the medicine side of the organization, whether it’s in the US, Europe, or Asia – If you need access to data, you come to the COE.

Essentially, we’re bringing together data expertise that we lend to the rest of the organization.

Su: The RWE COE seems to be a great example of a large pharma company building a competency around enterprise data management, which is the theme of this interview series. What does having this centralized function around RWD management and processes enable BI to do that was not possible before the COE was in place?

Paul: It’s really about speed and quality. A lot of times when you talk to scientists, speed means poor quality, lack of robustness, and messiness. But with the enterprise COE concept, speed is about maintaining efficiency and robust quality.

Another aspect is access to data via the cloud, no matter where you are located geographically. Once it’s cloud based, there is also a layer of data governance and patient privacy built in. Personal information is all anonymized. There’s a rigorous process to gain access to data and a separate process to gain access to analytical tools. Your access is limited to just what is needed for your project. Having a defined process helps us work a lot faster because we don’t have to worry about figuring out, ad hoc, how to get someone a certain cut of data or a certain application to work with that data.

There’s also cataloging of all this data. There is so much data — administrative claims, medical records, clinical trials, operations, customer data, etc. — that needs to be cataloged. This is something the organization is spending a lot of effort on. If you don’t know what data you have, don’t know where to start, or don’t have the ability to search catalogs with metadata tags, then we’ve lost the effort already..

Su:  You just mentioned a lot of first party data like clinical trial data, operations data, customer data. How do you use this data today for analytics?

Paul: It gets massive quickly when you include first party generated data such as patient apps, wearables, your iPhone, your Fitbit, etc. We keep that separate. We want to expand upon that but we’re not there yet. And we’re not even including social media data yet. We’re trying to keep it manageable with data we can link to with blockchain or tokenization solutions that connect the datasets together.

More isn’t always better. We need the right use cases to connect data. We hear “fit-for-purpose” a lot in life sciences and that’s how regulators think about things. Fit-for-purpose means we have to find the right data for the right question. I don’t want to bring together all this data, spend a lot of money on it, and then we have no use for it. That’s worse than not having the data in the first place.

Su:  I’d like to shift to use cases. What is the range of use cases that your team supports? How do you see use of RWE differ across different parts of the pharma lifecycle? What are some of the unique challenges for each group, from clinical development to HEOR to commercial and other teams?

Paul:  There are many, many use cases. On one end of the spectrum is general feasibility to understand population size in a disease area, all the way to very complicated use cases utilizing advanced data science techniques. Right now, we focus more on the middle, which is research studies using advanced methodologies but not getting fully into AI or machine learning or NLP. We’ll participate in those but that’s not our bread and butter. The focus is also not on general feasibility. We want to make sure we have tools that the organization can leverage to support feasibility counts, so those requests aren’t coming to the COE each time.

In clinical development, it is harder because we usually go directly to sophisticated use cases. Can we use RWD for an external comparator? Can we use it in our submission to FDA or EMA? This is not something we’ve done a lot of, so there are fewer people with expertise. We are advancing our experience in RWD for clinical development use cases, so it may not be regulatory submission but rather using RWD to support trial recruitment or something like that.

I like to think about use cases in terms of quick wins and where we can maximize impact without complicating the development program. You may not want to go directly to using an external comparator if that prevents marketing authorization or delays your development program. Instead, let’s consider how RWD can speed up your development program by helping you understand the patient population to recruit into the trial, or by conducting natural history studies to support evidence that you submit to the authorities. These things can add a lot of value and mitigate risk.

Once we hit marketing authorization in the post-marketing space, RWD is our bread and butter.

We’ve used it a lot and there’s a lot more expertise. Use cases like post-marketing safety, access programs, and HEOR studies will continue, and we need to think about how to speed up traditional studies. We do not need to wait until marketing authorization to start working on RWD analyses; we can use RWD to generate insights on the market landscape to prepare for launch.

Su:  Are there certain therapeutic areas where RWD is utilized a lot more today than other areas? In what therapeutic areas do you think more RWD ought to be applied to have more impact?

Paul:  I think RWD should be used across the life cycle and across diseases, but it’s easier to use in chronic diseases today because there are more patients, so it is easier to reach a sufficient study population. We are constantly expanding and growing into broader diseases and areas such as oncology and rare diseases where the unmet need is huge and RWE can add a tremendous amount of value to the broader population.

In the rare disease space, that’s where it becomes a lot more difficult, because we need a certain sample size to do the work. In some rare diseases, we may need a variety of data so it becomes more difficult to get access to all that connected data. I’d like to see us do more in rare diseases and technologies such as tokenization and being able to link disparate data sources will help us move forward. I’d like to see us think more outside the box so that we’re not only using administrative claims. What else is available? How can we get the data we need?

At the end of the day, this is extremely important because it goes back to the patient. How can we reduce burden on the patient, in a clinical trial for instance, and leverage data we already have. It would engage the patient as well to know that we’re using their data appropriately for research purposes, and we no longer need to ask hundreds of questions but rather, we can just see it in their RWD.

Su:  That makes a lot of sense! Zooming out, if you were to rate the pharma industry on a maturity scale for use of RWD (1 = nascent; 5 = mature), where do you think the industry is today?

Paul:  It is between 3 to 4. If the question is specifically about RWD for clinical development or to support regulatory submission, I’d rate us a 2 max. Life sciences and pharma aren’t as bad as some would suggest and we are also not as good as others would suggest, although we’re moving in the right direction. In general, big pharma is never going to be like a tech startup.

Su:  Last question – if you had no constraints and could accelerate one area of what you are overseeing, what would that one area of investment be?

Paul:  Being able to link disparate data sources, whether it is with a token or blockchain, is critical. If I could do that across all our data sources, bringing together genomic data with clinical data and lab data, that’s where I would invest. In particular, the need for genomics data is huge. The future I want to build is one where a clinical development researcher who is interested in a specific population to recruit will look to our HEOR and epidemiology teams first (versus external data sources) to see if those target patient populations already exist and we can understand their outcomes as quickly as possible and in a robust manner. Similarly, how do we leverage the studies that clinical development teams are conducting, to share learnings with teams in HEOR and Epi? There needs to be consistency across the drug development lifecycle.

Su:  That vision of collaboration is a great one to strive for. Paul, thank you so much for sharing these insights with me today. Really appreciate your time!

Disclaimer:  The opinions expressed in this interview are solely those of the presenter and not necessarily those of Boehringer Ingelheim Pharmaceuticals Inc. Boehringer Ingelheim Pharmaceuticals Inc does not guarantee the accuracy or reliability of the information provided herein.

If you would like to learn more about this series, please email Su Huang at


Connect to the Nation's Largest Health Data Ecosystem

Request a demo

Achieve your boldest ambitions

Explore how Datavant can be your health data logistics partner.

Contact us