Blog /

From Glass Slides to Pixels: The Power of Digital Pathology Data

Publish Date
Read Time
November 16, 2023
Ross Cantor, MT(ASCP), Vice President, Data Strategy & Partnerships at Proscia

In our Ecosystem Explorer Series, we interview leaders from partner organizations who are improving access to real-world data. Today’s interview is with Ross Cantor, MT(ASCP), Vice President, Data Strategy & Partnerships at Proscia.

At Proscia, Ross leverages his extensive clinical and data expertise to foster collaborations with top laboratories and biopharma organizations, enhancing access to high-quality real-world data and steering scientific breakthroughs. Over his 18-year journey in the healthcare industry, Ross has played pivotal roles, commencing at the Hospital of the University of Pennsylvania, making significant contributions to Genzyme Diagnostics, and recently serving as the VP of Strategy & Business Development at Lifepoint Informatics. His consistent commitment to prioritizing digital health and promoting interoperability across healthcare systems, provider networks, and clinical platforms underscores his dedication to driving innovation in the field.

Proscia is accelerating pathology’s transformation to digital and using data to reshape our understanding of diseases like cancer. Its Concentriq enterprise pathology platform and powerful AI applications are unlocking new insights that accelerate R&D, better inform treatment decisions, and advance the quest for precision medicine. 14 of the top 20 pharmaceutical companies and leading diagnostic laboratories rely on Proscia’s software each day.

Introduction to pathology data and why it’s important

Ross, welcome to the Ecosystem Explorer interview series! To start off, can you give us an overview of what pathology data is, how it is used, and why it is important to healthcare researchers?

Certainly. Keeping in mind that pathology is the study of disease, pathology data has long played a vital role in the diagnosis of cancer and other diseases, informing up to 70% of clinical decisions. With the rise of digitization and precision medicine, pathology data also increasingly serves as a bridge between clinical practice and drug discovery.

Pathology data itself has traditionally consisted of tissue biopsies affixed to glass slides, which pathologists, who are trained medical doctors, use to make a diagnosis. We also like to think of the unstructured reports where pathologists interpret and characterize tissue to inform clinicians creating treatment plans as pathology data.

Pathology’s shift to digital has generated a new real-world data asset, whole slide images. These are high-resolution images of tissue biopsies that are emerging as an incredibly rich method for understanding a patient’s disease. Each of the 1 billion whole slide images created every year is made up of over 1 billion pixels (~1GB+) and contain so much more information than what the eye can see.

The central role that pathology data plays in understanding a patient’s condition, as well as the wealth of information that whole slide images include, are what make it so valuable to R&D teams today.

What inspired the founding of Proscia, and how has the company’s focus evolved since its inception?

At Proscia, we believe that pathology deserves great technology. Pathologists are on the front lines of fighting some of humanity’s biggest challenges, like cancer. Despite the impact that software has had on almost every healthcare domain – and the world more generally – it had not begun making its mark on pathology until about five years ago. The practice had remained largely unchanged in its 150-year history, still depending on the microscope and glass slides.

Proscia was founded to deliver that great technology and use the data that it generates to reshape the way we understand disease. We initially began driving pathology’s digital transformation across the life sciences. As adoption took off on the clinical side, in part due to the pandemic, we also started growing our base of diagnostic laboratories. 14 of the top 20 pharmaceutical companies and leading diagnostic laboratories now rely on our Concentriq enterprise pathology platform to conduct routine operations.

But this is just part of our story. In working to reshape our understanding of disease, we are helping pathology to play an even bigger role in the data-driven precision medicine paradigm. We have emerged as more than a software provider and work closely with our customers to elevate the role of pathology in the 21st century. This is where our focus on real-world data and our partnership with Datavant become especially exciting.

How do you source pathology data?

We work with a network of leading academic medical centers and commercial laboratories at the forefront of pathology’s digital transformation. With them, we are assembling one of the largest collections of diverse, labeled whole slide images of solid tumor tissue biopsies and other rare diseases. This includes images from over 2M patients and counting.

These whole slide images are paired with corresponding multi-modal data, including pathology reports, biomarker results, next-generation sequencing (NGS) and molecular testing, and other clinically-relevant laboratory data captured as part of the standard of care.

Let’s talk about how pathology data fits with other types of clinical and real-world data. As you know, Datavant enables organizations to link disparate datasets at the patient level – we often see organizations linking their 1st party data to 3rd party claims and EHR data, for example, to fill in gaps in patient records and better understand disease progression. Where does pathology data fit in, and in what circumstances would researchers get more value by linking their data with whole slide images and pathology data?

Whole slide images are among the best representatives of disease. They capture the cellular tissue-level details and patterns that determine diagnosis. The power of this data is amplified when it is incorporated as part of a multi-modal approach.

The impact of such an approach is especially clear when it comes to identifying new precision therapeutics. Consider the drug pembrolizumab, commonly known as Keytruda. A landmark 2016 study found that patients with Non-Small Cell Lung Cancer (NSCLC) who expressed PD-L1 on over 50% of tumor cells demonstrated longer survival following treatment with pembrolizumab compared to platinum-based chemotherapy. Pathology data factored heavily into this assessment of PD-L1 expression, and now, pembrolizumab is the preferred treatment for these patients.

The development of precision therapeutics is just one of the many applications of pathology data in a multi-modal context. Scientists can now generate a complete longitudinal record to assist with post-market surveillance. Other use cases include novel biomarker identification and validation, patient stratification for clinical trials, and indication expansion.

Those are exciting applications, and it’s great to hear pathology data is already having an impact on clinical R&D and patient care. Are there any other success stories you’d like to share that highlight the impact of connecting pathology data to real-world data?

A top pharmaceutical company is using multi-modal data, including whole slide images linked to other clinically-relevant laboratory data, that we curated to develop an AI application. The application aims to enable scientists to predict a known biomarker for lung cancer by identifying patterns, or signals, from within the pixels contained in the whole slide images.

This biomarker is critical to ensuring effective, targeted treatment for patients; however, it is currently only identified through expensive, time-consuming molecular tests. In turn, only some patients have access to it. Waiting for results can delay the start of treatment, which can have an especially big effect on outcomes for patients with advanced disease.

As a long-term goal, the pharmaceutical company hopes to pave the way for the development of more personalized treatments by identifying the biomarker and then analyzing the patient’s response to therapy.

The constraints and challenges with pathology data

Let discuss some of the challenges associated with pathology data. Are there difficulties with collecting and managing large volumes of pathology data? You mentioned that whole slide images are quite large.

When it comes to collecting data, pathology reports consist of unstructured data, and the corresponding whole slide images are not organized in a way that makes them easy to search. (On a related note, this becomes especially challenging for AI development and training.) Whole slide images also exist in many different file formats and live in many systems. To add to the complexity, diagnostic laboratories have a separate laboratory information system that contains a lot of the clinically-relevant data that corresponds with whole slide imaging data, and this is often siloed as well.

While file format compatibility can make managing data in one central system difficult, the biggest issue with data management is almost always the sheer size of whole slide images. They are massive – up to 1GB each – which is 2 to 10 times larger than an average radiology image. Laboratories and R&D teams are not used to working with such large files, and their systems may not scale to support them.

We built our Concentriq enterprise pathology platform to overcome many of these challenges by unifying teams, data, and applications. It serves as the system of record for whole slide images in a variety of file formats and is designed to integrate with the laboratory information system to seamlessly incorporate other pathology data in one central location. Concentriq is also incredibly scalable to account for massive volumes of data.

In turn, diagnostic laboratories can manage all of their data in a way that makes it easy to collect, R&D teams can centralize all of their data in a way that makes it easy to incorporate into studies. R&D teams can also leverage Concentriq’s developer platform to develop their own AI models against their very large data sets and deploy them into routine operations.

How do you address concerns about data privacy and security, especially given that pathology data is unstructured?

Data privacy and security are top priorities for us, and we take careful measures to address them. Our Concentriq platform is HIPAA-compliant. All PHI and PII stays in the laboratory’s environment, where it is de-identified before it is made accessible to the research organization. Our team also carries out a rigorous validation process to ensure accurate data.

Datavant’s tokenization and de-identification process adds another layer of protection to all that we do, helping to further regulatory compliance, patent anonymity, and privacy. This is just one of the many synergies of our partnership.

How does Proscia process and curate pathology data to make it useful for researchers and organizations?

After data is de-identified, our team uses AI and machine learning to extract relevant details from unstructured pathology reports and other data types, ensuring that we follow FAIR principles to assemble large, complex data cohorts quickly. Data is further curated by pathologists who annotate whole slide images to identify those that are most relevant in a particular case and to illustrate regions of interest. These steps enable us to make fully distilled, clean, and ready-to-use data available to scientists on Concentriq for both their R&D activities and AI model development.

Future opportunities for pathology data

Now that we’ve covered the challenges, let’s look at the opportunities with pathology data. How do you think digital pathology is improving the field of medicine, particularly in cancer research and diagnostics?

On the research side, we’ve covered many of the ways that digital pathology and pathology data are advancing precision medicine. It’s also worth noting that they are making an impact on day-to-day operations. A survey of major pharmaceutical companies and contract research organizations conducted earlier this year found that 70% of respondents had already invested in digital pathology. 83% of them adopted it to improve collaboration, as sharing whole slide images is much more efficient than transporting glass slides. In turn, they can build networks of internal and external collaborators around the world to best carry out their studies.

On the diagnostic side, digital pathology is helping to drive meaningful efficiency gains to overcome the growing shortage of pathologists and rising cancer burden. It is also resulting in improved accuracy; AI applications are able to unlock new, clinically impactful insights to aid pathologists in making a diagnosis.

What’s perhaps most exciting is that we are seeing a flywheel effect as innovations from the life sciences increasingly make their way into diagnostic laboratories. These innovations, like PD-L1 detection algorithms, are helping to accelerate adoption among diagnostic laboratories, generating more real-world data to fuel research breakthroughs.

What do you see as the most exciting opportunities for researchers and organizations working with pathology data in the coming years?

The consumerization of AI, and specifically advanced techniques like generative AI, are expanding the potential of pathology data unlike ever before and lowering the bar for entry for developing AI applications. This is driving unprecedented demand for both high-quality data and data scientists who can tap into its full potential. In fact, we’re seeing some pharmaceutical companies rapidly expand their data science teams to build generative AI solutions for a wide range of use cases, including identifying and validating new biomarkers, assessing target compounds, and predicting drug toxicity.

It goes without saying that we’re in the first inning of realizing the promise of generative AI, and the opportunities for pathology data will only become even more impactful.

Are there any other innovations in this space that you are particularly excited about?

There are so many exciting innovations that we could highlight. One specifically, vision transformers, is at the center of so much AI development in pathology that it’s especially worth highlighting. A vision transformer is a specific type of AI model that processes images. Meta’s Segment Anything Model (SAM), which shows promise for accelerating pathology image segmentation, is among the best-known examples and has led to a significant wave of AI R&D. Innovations like SAM mean that it will be easier than ever before to gather data for projects that require pixel annotations. SAM has made such an impact, in part, because it’s also a foundation model that can be used for many downstream tasks.

Beyond SAM, other foundation models are driving the creation of new pathology-related applications. Existing vision transformers are already making it faster and easier to build new AI models based solely on whole slide images.

These vision transformers are also bringing digital pathology into the world of multi-modality. Vision-language models like CONCH are enabling a wide range of previously unavailable functionalities like searching pathology images via text and automatically generating image captions. This interaction with images through text is just the beginning; it’s not a far stretch to envision combining pathology images and text with other modalities to reshape how we interact with and leverage the relationships among data modalities.

Such a model already exists in the natural image space (e.g. ImageBind), and pathology is now developing on its heels. Future foundation models may similarly incorporate other data modalities and flexibly associate -omics data with histological patterns. All of these examples point to one clear trend – more and more data is becoming utilizable to fuel AI development, unlocking insights that bring personalized therapies to patients faster and advance precision medicine.

Ross, thank you again for the deep dive on pathology data and sharing your excitement for the advancements in this space. Do you have any recommendations for our readers if they want to learn more?

Thanks for the opportunity! Here are some helpful links:

This interview is part of our Ecosystem Explorer Series, in which we interview leaders from partner organizations who are improving access to health data. Contact us if you’re interested in participating in this series.


Connect to the Nation's Largest Health Data Ecosystem

Request a demo

Achieve your boldest ambitions

Explore how Datavant can be your health data logistics partner.

Contact us