-
PathAlign: A vision-language model for whole slide images in histopathology
Authors:
Faruk Ahmed,
Andrew Sellergren,
Lin Yang,
Shawn Xu,
Boris Babenko,
Abbi Ward,
Niels Olson,
Arash Mohtashamian,
Yossi Matias,
Greg S. Corrado,
Quang Duong,
Dale R. Webster,
Shravya Shetty,
Daniel Golden,
Yun Liu,
David F. Steiner,
Ellery Wulczyn
Abstract:
Microscopic interpretation of histopathology images underlies many important diagnostic and treatment decisions. While advances in vision-language modeling raise new opportunities for analysis of such images, the gigapixel-scale size of whole slide images (WSIs) introduces unique challenges. Additionally, pathology reports simultaneously highlight key findings from small regions while also aggrega…
▽ More
Microscopic interpretation of histopathology images underlies many important diagnostic and treatment decisions. While advances in vision-language modeling raise new opportunities for analysis of such images, the gigapixel-scale size of whole slide images (WSIs) introduces unique challenges. Additionally, pathology reports simultaneously highlight key findings from small regions while also aggregating interpretation across multiple slides, often making it difficult to create robust image-text pairs. As such, pathology reports remain a largely untapped source of supervision in computational pathology, with most efforts relying on region-of-interest annotations or self-supervision at the patch-level. In this work, we develop a vision-language model based on the BLIP-2 framework using WSIs paired with curated text from pathology reports. This enables applications utilizing a shared image-text embedding space, such as text or image retrieval for finding cases of interest, as well as integration of the WSI encoder with a frozen large language model (LLM) for WSI-based generative text capabilities such as report generation or AI-in-the-loop interactions. We utilize a de-identified dataset of over 350,000 WSIs and diagnostic text pairs, spanning a wide range of diagnoses, procedure types, and tissue types. We present pathologist evaluation of text generation and text retrieval using WSI embeddings, as well as results for WSI classification and workflow prioritization (slide-level triaging). Model-generated text for WSIs was rated by pathologists as accurate, without clinically significant error or omission, for 78% of WSIs on average. This work demonstrates exciting potential capabilities for language-aligned WSI embeddings.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
A Matrix Exponential Generalization of the Laplace Transform of Poisson Shot Noise
Authors:
Nicholas R. Olson,
Jeffrey G. Andrews
Abstract:
We consider a generalization of the Laplace transform of Poisson shot noise defined as an integral transform with respect to a matrix exponential. We denote this integral transform as the {\em matrix Laplace transform} given its similarity to the Laplace-Stieltjes transform. We establish that the matrix Laplace transform is in general a natural matrix function extension of the typical scalar Lapla…
▽ More
We consider a generalization of the Laplace transform of Poisson shot noise defined as an integral transform with respect to a matrix exponential. We denote this integral transform as the {\em matrix Laplace transform} given its similarity to the Laplace-Stieltjes transform. We establish that the matrix Laplace transform is in general a natural matrix function extension of the typical scalar Laplace transform, and that the matrix Laplace transform of Poisson shot noise admits an expression that is analogous to the expression implied by Campbell's theorem for the Laplace functional of a Poisson point process. We demonstrate the utility of our generalization of Campbell's theorem in two important applications: the characterization of a Poisson shot noise process and the derivation of the complementary cumulative distribution function (CCDF) of signal to interference and noise (SINR) models with phase-type distributed fading powers. In the former application, we demonstrate how the higher order moments of a linear combination of samples of a Poisson shot noise process may be obtained directly from the elements of its matrix Laplace transform. We further show how arbitrarily tight approximations and bounds on the CCDF of this object may be obtained from the summation of the first row of its matrix Laplace transform. For the latter application, we show how the CCDF of SINR models with phase-type distributed fading powers may be obtained in terms of an expectation of the matrix Laplace transform of the interference and noise, analogous to the canonical case of SINR models with Rayleigh fading.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
Optimized Sparse Matrix Operations for Reverse Mode Automatic Differentiation
Authors:
Nicolas Nytko,
Ali Taghibakhshi,
Tareq Uz Zaman,
Scott MacLachlan,
Luke N. Olson,
Matt West
Abstract:
Sparse matrix representations are ubiquitous in computational science and machine learning, leading to significant reductions in compute time, in comparison to dense representation, for problems that have local connectivity. The adoption of sparse representation in leading ML frameworks such as PyTorch is incomplete, however, with support for both automatic differentiation and GPU acceleration mis…
▽ More
Sparse matrix representations are ubiquitous in computational science and machine learning, leading to significant reductions in compute time, in comparison to dense representation, for problems that have local connectivity. The adoption of sparse representation in leading ML frameworks such as PyTorch is incomplete, however, with support for both automatic differentiation and GPU acceleration missing. In this work, we present an implementation of a CSR-based sparse matrix wrapper for PyTorch with CUDA acceleration for basic matrix operations, as well as automatic differentiability. We also present several applications of the resulting sparse kernels to optimization problems, demonstrating ease of implementation and performance measurements versus their dense counterparts.
△ Less
Submitted 9 November, 2023; v1 submitted 9 December, 2022;
originally announced December 2022.
-
Coverage and Rate of Joint Communication and Parameter Estimation in Wireless Networks
Authors:
Nicholas R. Olson,
Jeffrey G. Andrews,
Robert W. Heath Jr
Abstract:
From an information theoretic perspective, joint communication and sensing (JCAS) represents a natural generalization of communication network functionality. However, it requires the re-evaluation of network performance from a multi-objective perspective. We develop a novel mathematical framework for characterizing the sensing and communication coverage probability and ergodic rate in JCAS network…
▽ More
From an information theoretic perspective, joint communication and sensing (JCAS) represents a natural generalization of communication network functionality. However, it requires the re-evaluation of network performance from a multi-objective perspective. We develop a novel mathematical framework for characterizing the sensing and communication coverage probability and ergodic rate in JCAS networks. We employ a formulation of sensing parameter estimation based on mutual information to extend the notions of coverage probability and ergodic rate to the radar setting. We define sensing coverage probability as the probability that the rate of information extracted about the parameters of interest associated with a typical radar target exceeds some threshold, and sensing ergodic rate as the spatial average of the aforementioned rate of information. Using this framework, we analyze the downlink sensing and communication coverage and rate of a mmWave JCAS network employing a shared waveform, directional beamforming, and monostatic sensing. Leveraging tools from stochastic geometry, we derive upper and lower bounds for these quantities. We also develop several general technical results including: i) a generic method for obtaining closed form upper and lower bounds on the Laplace Transform of a shot noise process, ii) a new analog of H{รถ}lder's Inequality to the setting of harmonic means, and iii) a relation between the Laplace and Mellin Transforms of a non-negative random variable. We use the derived bounds to numerically investigate the performance of JCAS networks under varying base station and blockage density. Among several insights, our numerical analysis indicates that network densification improves sensing SINR performance -- in contrast to communications.
△ Less
Submitted 15 January, 2024; v1 submitted 5 October, 2022;
originally announced October 2022.
-
Characterizing the Performance of Node-Aware Strategies for Irregular Point-to-Point Communication on Heterogeneous Architectures
Authors:
Shelby Lockhart,
Amanda Bienz,
William D. Gropp,
Luke N. Olson
Abstract:
Supercomputer architectures are trending toward higher computational throughput due to the inclusion of heterogeneous compute nodes. These multi-GPU nodes increase on-node computational efficiency, while also increasing the amount of data to be communicated and the number of potential data flow paths. In this work, we characterize the performance of irregular point-to-point communication with MPI…
▽ More
Supercomputer architectures are trending toward higher computational throughput due to the inclusion of heterogeneous compute nodes. These multi-GPU nodes increase on-node computational efficiency, while also increasing the amount of data to be communicated and the number of potential data flow paths. In this work, we characterize the performance of irregular point-to-point communication with MPI on heterogeneous compute environments through performance modeling, demonstrating the limitations of standard communication strategies for both device-aware and staging-through-host communication techniques. Presented models suggest staging communicated data through host processes then using node-aware communication strategies for high inter-node message counts. Notably, the models also predict that node-aware communication utilizing all available CPU cores to communicate inter-node data leads to the most performant strategy when communicating with a high number of nodes. Model validation is provided via a case study of irregular point-to-point communication patterns in distributed sparse matrix-vector products. Importantly, we include a discussion on the implications model predictions have on communication strategy design for emerging supercomputer architectures.
△ Less
Submitted 13 September, 2022;
originally announced September 2022.
-
Modeling Data Movement Performance on Heterogeneous Architectures
Authors:
Amanda Bienz,
Luke N. Olson,
William D. Gropp,
Shelby Lockhart
Abstract:
The cost of data movement on parallel systems varies greatly with machine architecture, job partition, and nearby jobs. Performance models that accurately capture the cost of data movement provide a tool for analysis, allowing for communication bottlenecks to be pinpointed. Modern heterogeneous architectures yield increased variance in data movement as there are a number of viable paths for inter-…
▽ More
The cost of data movement on parallel systems varies greatly with machine architecture, job partition, and nearby jobs. Performance models that accurately capture the cost of data movement provide a tool for analysis, allowing for communication bottlenecks to be pinpointed. Modern heterogeneous architectures yield increased variance in data movement as there are a number of viable paths for inter-GPU communication. In this paper, we present performance models for the various paths of inter-node communication on modern heterogeneous architectures, including the trade-off between GPUDirect communication and copying to CPUs. Furthermore, we present a novel optimization for inter-node communication based on these models, utilizing all available CPU cores per node. Finally, we show associated performance improvements for MPI collective operations.
△ Less
Submitted 16 July, 2021; v1 submitted 20 October, 2020;
originally announced October 2020.
-
Node-Aware Improvements to Allreduce
Authors:
Amanda Bienz,
Luke N. Olson,
William D. Gropp
Abstract:
The \texttt{MPI\_Allreduce} collective operation is a core kernel of many parallel codebases, particularly for reductions over a single value per process. The commonly used allreduce recursive-doubling algorithm obtains the lower bound message count, yielding optimality for small reduction sizes based on node-agnostic performance models. However, this algorithm yields duplicate messages between se…
▽ More
The \texttt{MPI\_Allreduce} collective operation is a core kernel of many parallel codebases, particularly for reductions over a single value per process. The commonly used allreduce recursive-doubling algorithm obtains the lower bound message count, yielding optimality for small reduction sizes based on node-agnostic performance models. However, this algorithm yields duplicate messages between sets of nodes. Node-aware optimizations in MPICH remove duplicate messages through use of a single master process per node, yielding a large number of inactive processes at each inter-node step. In this paper, we present an algorithm that uses the multiple processes available per node to reduce the maximum number of inter-node messages communicated by a single process, improving the performance of allreduce operations, particularly for small message sizes.
△ Less
Submitted 21 October, 2019;
originally announced October 2019.
-
Development and Validation of a Deep Learning Algorithm for Improving Gleason Scoring of Prostate Cancer
Authors:
Kunal Nagpal,
Davis Foote,
Yun Liu,
Po-Hsuan,
Chen,
Ellery Wulczyn,
Fraser Tan,
Niels Olson,
Jenny L. Smith,
Arash Mohtashamian,
James H. Wren,
Greg S. Corrado,
Robert MacDonald,
Lily H. Peng,
Mahul B. Amin,
Andrew J. Evans,
Ankur R. Sangoi,
Craig H. Mermel,
Jason D. Hipp,
Martin C. Stumpe
Abstract:
For prostate cancer patients, the Gleason score is one of the most important prognostic factors, potentially determining treatment independent of the stage. However, Gleason scoring is based on subjective microscopic examination of tumor morphology and suffers from poor reproducibility. Here we present a deep learning system (DLS) for Gleason scoring whole-slide images of prostatectomies. Our syst…
▽ More
For prostate cancer patients, the Gleason score is one of the most important prognostic factors, potentially determining treatment independent of the stage. However, Gleason scoring is based on subjective microscopic examination of tumor morphology and suffers from poor reproducibility. Here we present a deep learning system (DLS) for Gleason scoring whole-slide images of prostatectomies. Our system was developed using 112 million pathologist-annotated image patches from 1,226 slides, and evaluated on an independent validation dataset of 331 slides, where the reference standard was established by genitourinary specialist pathologists. On the validation dataset, the mean accuracy among 29 general pathologists was 0.61. The DLS achieved a significantly higher diagnostic accuracy of 0.70 (p=0.002) and trended towards better patient risk stratification in correlations to clinical follow-up data. Our approach could improve the accuracy of Gleason scoring and subsequent therapy decisions, particularly where specialist expertise is unavailable. The DLS also goes beyond the current Gleason system to more finely characterize and quantitate tumor morphology, providing opportunities for refinement of the Gleason system itself.
△ Less
Submitted 15 November, 2018;
originally announced November 2018.
-
Improving Performance Models for Irregular Point-to-Point Communication
Authors:
Amanda Bienz,
William D. Gropp,
Luke N. Olson
Abstract:
Parallel applications are often unable to take full advantage of emerging parallel architectures due to scaling limitations, which arise due to inter-process communication. Performance models are used to analyze the sources of communication costs. However, traditional models for point-to-point communication fail to capture the full cost of many irregular operations, such as sparse matrix methods.…
▽ More
Parallel applications are often unable to take full advantage of emerging parallel architectures due to scaling limitations, which arise due to inter-process communication. Performance models are used to analyze the sources of communication costs. However, traditional models for point-to-point communication fail to capture the full cost of many irregular operations, such as sparse matrix methods. In this paper, a node-aware based model is presented. Furthermore, the model is extended to include communication queue search time as well as an additional parameter estimating network contention. The resulting model is applied to a variety of irregular communication patterns throughout matrix operations, displaying improved accuracy over traditional models.
△ Less
Submitted 6 June, 2018;
originally announced June 2018.
-
Scaling Structured Multigrid to 500K+ Cores through Coarse-Grid Redistribution
Authors:
Andrew Reisner,
Luke N. Olson,
J. David Moulton
Abstract:
The efficient solution of sparse, linear systems resulting from the discretization of partial differential equations is crucial to the performance of many physics-based simulations. The algorithmic optimality of multilevel approaches for common discretizations makes them a good candidate for an efficient parallel solver. Yet, modern architectures for high-performance computing systems continue to…
▽ More
The efficient solution of sparse, linear systems resulting from the discretization of partial differential equations is crucial to the performance of many physics-based simulations. The algorithmic optimality of multilevel approaches for common discretizations makes them a good candidate for an efficient parallel solver. Yet, modern architectures for high-performance computing systems continue to challenge the parallel scalability of multilevel solvers. While algebraic multigrid methods are robust for solving a variety of problems, the increasing importance of data locality and cost of data movement in modern architectures motivates the need to carefully exploit structure in the problem.
Robust logically structured variational multigrid methods, such as Black Box Multigrid (BoxMG), maintain structure throughout the multigrid hierarchy. This avoids indirection and increased coarse-grid communication costs typical in parallel algebraic multigrid. Nevertheless, the parallel scalability of structured multigrid is challenged by coarse-grid problems where the overhead in communication dominates computation. In this paper, an algorithm is introduced for redistributing coarse-grid problems through incremental agglomeration. Guided by a predictive performance model, this algorithm provides robust redistribution decisions for structured multilevel solvers.
A two-dimensional diffusion problem is used to demonstrate the significant gain in performance of this algorithm over the previous approach that used agglomeration to one processor. In addition, the parallel scalability of this approach is demonstrated on two large-scale computing systems, with solves on up to 500K+ cores.
△ Less
Submitted 6 March, 2018;
originally announced March 2018.
-
Node Aware Sparse Matrix-Vector Multiplication
Authors:
Amanda Bienz,
William D. Gropp,
Luke N. Olson
Abstract:
The sparse matrix-vector multiply (SpMV) operation is a key computational kernel in many simulations and linear solvers. The large communication requirements associated with a reference implementation of a parallel SpMV result in poor parallel scalability. The cost of communication depends on the physical locations of the send and receive processes: messages injected into the network are more cost…
▽ More
The sparse matrix-vector multiply (SpMV) operation is a key computational kernel in many simulations and linear solvers. The large communication requirements associated with a reference implementation of a parallel SpMV result in poor parallel scalability. The cost of communication depends on the physical locations of the send and receive processes: messages injected into the network are more costly than messages sent between processes on the same node. In this paper, a node aware parallel SpMV (NAPSpMV) is introduced to exploit knowledge of the system topology, specifically the node-processor layout, to reduce costs associated with communication. The values of the input vector are redistributed to minimize both the number and the size of messages that are injected into the network during a SpMV, leading to a reduction in communication costs. A variety of computational experiments that highlight the efficiency of this approach are presented.
△ Less
Submitted 15 November, 2017; v1 submitted 23 December, 2016;
originally announced December 2016.
-
Reducing Parallel Communication in Algebraic Multigrid through Sparsification
Authors:
Amanda Bienz,
Robert D. Falgout William Gropp,
Luke N. Olson,
Jacob B. Schroder
Abstract:
Algebraic multigrid (AMG) is an $\mathcal{O}(n)$ solution process for many large sparse linear systems. A hierarchy of progressively coarser grids is constructed that utilize complementary relaxation and interpolation operators. High-energy error is reduced by relaxation, while low-energy error is mapped to coarse-grids and reduced there. However, large parallel communication costs often limit par…
▽ More
Algebraic multigrid (AMG) is an $\mathcal{O}(n)$ solution process for many large sparse linear systems. A hierarchy of progressively coarser grids is constructed that utilize complementary relaxation and interpolation operators. High-energy error is reduced by relaxation, while low-energy error is mapped to coarse-grids and reduced there. However, large parallel communication costs often limit parallel scalability. As the multigrid hierarchy is formed, each coarse matrix is formed through a triple matrix product. The resulting coarse-grids often have significantly more nonzeros per row than the original fine-grid operator, thereby generating high parallel communication costs on coarse-levels. In this paper, we introduce a method that systematically removes entries in coarse-grid matrices after the hierarchy is formed, leading to an improved communication costs. We sparsify by removing weakly connected or unimportant entries in the matrix, leading to improved solve time. The main trade-off is that if the heuristic identifying unimportant entries is used too aggressively, then AMG convergence can suffer. To counteract this, the original hierarchy is retained, allowing entries to be reintroduced into the solver hierarchy if convergence is too slow. This enables a balance between communication cost and convergence, as necessary. In this paper we present new algorithms for reducing communication and present a number of computational experiments in support.
△ Less
Submitted 14 December, 2015;
originally announced December 2015.