Document Type

Technical Report

Publication Date

Spring 2025

Technical Report Number

TR2025-1004

Faculty Approver

Soroush Vosoughi

Abstract

As digital pathology becomes increasingly popular, it is critical to develop machine learning solutions to utilize this data. While other image modalities have seen exponential increases in methodology availability, the same has not been true for histopathology images. This is likely in part because histopathology whole slide images possess unique characteristics that prevent simply applying existing methods as-is.

In this thesis, we identify and propose solutions to 3 open problems with histopathology images: 1. large raw image size (up to 150,000×150,000 pixels in size), 2. low class-positivity (low ratio of positive to negative patches), and 3. limited image availability with existing images having weak or no labels. We address the large raw image size problem by designing a knowledge distillation-based approach to reduce computational cost significantly with a modest decrease in classification performance. The computational cost reductions are substantial enough to enable real time use in clinical scenarios. For the low class-positivity issue, we develop a custom view generation approach for self-supervised representation learning. This view generation approach takes advantage of the low class-positivity to increase possible view pairings and produce better classification outcomes. Lastly, we present an image generation approach using existing image-spatial transcriptomics pairs to generate synthetic histopathology patches. We demonstrate these generated patches are clinically useful through evaluations including nuclei distribution quantification and downstream tasks.

Share

COinS