Search | arXiv e-print repository

Tyche: Stochastic In-Context Learning for Medical Image Segmentation

Authors: Marianne Rakic, Hallee E. Wong, Jose Javier Gonzalez Ortiz, Beth Cimini, John Guttag, Adrian V. Dalca

Abstract: Existing learning-based solutions to medical image segmentation have two important shortcomings. First, for most new segmentation task, a new model has to be trained or fine-tuned. This requires extensive resources and machine learning expertise, and is therefore often infeasible for medical researchers and clinicians. Second, most existing segmentation methods produce a single deterministic segme… ▽ More Existing learning-based solutions to medical image segmentation have two important shortcomings. First, for most new segmentation task, a new model has to be trained or fine-tuned. This requires extensive resources and machine learning expertise, and is therefore often infeasible for medical researchers and clinicians. Second, most existing segmentation methods produce a single deterministic segmentation mask for a given image. In practice however, there is often considerable uncertainty about what constitutes the correct segmentation, and different expert annotators will often segment the same image differently. We tackle both of these problems with Tyche, a model that uses a context set to generate stochastic predictions for previously unseen tasks without the need to retrain. Tyche differs from other in-context segmentation methods in two important ways. (1) We introduce a novel convolution block architecture that enables interactions among predictions. (2) We introduce in-context test-time augmentation, a new mechanism to provide prediction stochasticity. When combined with appropriate model design and loss functions, Tyche can predict a set of plausible diverse segmentation candidates for new or unseen medical images and segmentation tasks without the need to retrain. △ Less

Submitted 24 January, 2024; originally announced January 2024.

arXiv:2312.07381 [pdf, other]

ScribblePrompt: Fast and Flexible Interactive Segmentation for Any Biomedical Image

Authors: Hallee E. Wong, Marianne Rakic, John Guttag, Adrian V. Dalca

Abstract: Biomedical image segmentation is a crucial part of both scientific research and clinical care. With enough labelled data, deep learning models can be trained to accurately automate specific biomedical image segmentation tasks. However, manually segmenting images to create training data is highly labor intensive and requires domain expertise. We present ScribblePrompt, a flexible neural network bas… ▽ More Biomedical image segmentation is a crucial part of both scientific research and clinical care. With enough labelled data, deep learning models can be trained to accurately automate specific biomedical image segmentation tasks. However, manually segmenting images to create training data is highly labor intensive and requires domain expertise. We present ScribblePrompt, a flexible neural network based interactive segmentation tool for biomedical imaging that enables human annotators to segment previously unseen structures using scribbles, clicks, and bounding boxes. Through rigorous quantitative experiments, we demonstrate that given comparable amounts of interaction, ScribblePrompt produces more accurate segmentations than previous methods on datasets unseen during training. In a user study with domain experts, ScribblePrompt reduced annotation time by 28% while improving Dice by 15% compared to the next best method. ScribblePrompt's success rests on a set of careful design decisions. These include a training strategy that incorporates both a highly diverse set of images and tasks, novel algorithms for simulated user interactions and labels, and a network that enables fast inference. We showcase ScribblePrompt in an online demo and provide code at https://scribbleprompt.csail.mit.edu △ Less

Submitted 12 April, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

Comments: Project Website: https://scribbleprompt.csail.mit.edu Keywords: Interactive Segmentation, Medical Imaging, Segment Anything Model, SAM, Scribble Annotations, Prompt

arXiv:2307.11315 [pdf, other]

GIST: Generating Image-Specific Text for Fine-grained Object Classification

Authors: Kathleen M. Lewis, Emily Mu, Adrian V. Dalca, John Guttag

Abstract: Recent vision-language models outperform vision-only models on many image classification tasks. However, because of the absence of paired text/image descriptions, it remains difficult to fine-tune these models for fine-grained image classification. In this work, we propose a method, GIST, for generating image-specific fine-grained text descriptions from image-only datasets, and show that these tex… ▽ More Recent vision-language models outperform vision-only models on many image classification tasks. However, because of the absence of paired text/image descriptions, it remains difficult to fine-tune these models for fine-grained image classification. In this work, we propose a method, GIST, for generating image-specific fine-grained text descriptions from image-only datasets, and show that these text descriptions can be used to improve classification. Key parts of our method include 1. prompting a pretrained large language model with domain-specific prompts to generate diverse fine-grained text descriptions for each class and 2. using a pretrained vision-language model to match each image to label-preserving text descriptions that capture relevant visual features in the image. We demonstrate the utility of GIST by fine-tuning vision-language models on the image-and-generated-text pairs to learn an aligned vision-language representation space for improved classification. We evaluate our learned representation space in full-shot and few-shot scenarios across four diverse fine-grained classification datasets, each from a different domain. Our method achieves an average improvement of $4.1\%$ in accuracy over CLIP linear probes and an average of $1.1\%$ improvement in accuracy over the previous state-of-the-art image-text classification method on the full-shot datasets. Our method achieves similar improvements across few-shot regimes. Code is available at https://github.com/emu1729/GIST. △ Less

Submitted 4 August, 2023; v1 submitted 20 July, 2023; originally announced July 2023.

Comments: The first two authors contributed equally to this work and are listed in alphabetical order

arXiv:2307.10923 [pdf, other]

Sequential Multi-Dimensional Self-Supervised Learning for Clinical Time Series

Authors: Aniruddh Raghu, Payal Chandak, Ridwan Alam, John Guttag, Collin M. Stultz

Abstract: Self-supervised learning (SSL) for clinical time series data has received significant attention in recent literature, since these data are highly rich and provide important information about a patient's physiological state. However, most existing SSL methods for clinical time series are limited in that they are designed for unimodal time series, such as a sequence of structured features (e.g., lab… ▽ More Self-supervised learning (SSL) for clinical time series data has received significant attention in recent literature, since these data are highly rich and provide important information about a patient's physiological state. However, most existing SSL methods for clinical time series are limited in that they are designed for unimodal time series, such as a sequence of structured features (e.g., lab values and vitals signs) or an individual high-dimensional physiological signal (e.g., an electrocardiogram). These existing methods cannot be readily extended to model time series that exhibit multimodality, with structured features and high-dimensional data being recorded at each timestep in the sequence. In this work, we address this gap and propose a new SSL method -- Sequential Multi-Dimensional SSL -- where a SSL loss is applied both at the level of the entire sequence and at the level of the individual high-dimensional data points in the sequence in order to better capture information at both scales. Our strategy is agnostic to the specific form of loss function used at each level -- it can be contrastive, as in SimCLR, or non-contrastive, as in VICReg. We evaluate our method on two real-world clinical datasets, where the time series contains sequences of (1) high-frequency electrocardiograms and (2) structured data from lab values and vitals signs. Our experimental results indicate that pre-training with our method and then fine-tuning on downstream tasks improves performance over baselines on both datasets, and in several settings, can lead to improvements across different self-supervised loss functions. △ Less

Submitted 20 July, 2023; originally announced July 2023.

Comments: ICML 2023

arXiv:2307.02712 [pdf, other]

Multi-Similarity Contrastive Learning

Authors: Emily Mu, John Guttag, Maggie Makar

Abstract: Given a similarity metric, contrastive methods learn a representation in which examples that are similar are pushed together and examples that are dissimilar are pulled apart. Contrastive learning techniques have been utilized extensively to learn representations for tasks ranging from image classification to caption generation. However, existing contrastive learning approaches can fail to general… ▽ More Given a similarity metric, contrastive methods learn a representation in which examples that are similar are pushed together and examples that are dissimilar are pulled apart. Contrastive learning techniques have been utilized extensively to learn representations for tasks ranging from image classification to caption generation. However, existing contrastive learning approaches can fail to generalize because they do not take into account the possibility of different similarity relations. In this paper, we propose a novel multi-similarity contrastive loss (MSCon), that learns generalizable embeddings by jointly utilizing supervision from multiple metrics of similarity. Our method automatically learns contrastive similarity weightings based on the uncertainty in the corresponding similarity, down-weighting uncertain tasks and leading to better out-of-domain generalization to new tasks. We show empirically that networks trained with MSCon outperform state-of-the-art baselines on in-domain and out-of-domain settings. △ Less

Submitted 5 July, 2023; originally announced July 2023.

arXiv:2304.09270 [pdf, other]

Coarse race data conceals disparities in clinical risk score performance

Authors: Rajiv Movva, Divya Shanmugam, Kaihua Hou, Priya Pathak, John Guttag, Nikhil Garg, Emma Pierson

Abstract: Healthcare data in the United States often records only a patient's coarse race group: for example, both Indian and Chinese patients are typically coded as "Asian." It is unknown, however, whether this coarse coding conceals meaningful disparities in the performance of clinical risk scores across granular race groups. Here we show that it does. Using data from 418K emergency department visits, we… ▽ More Healthcare data in the United States often records only a patient's coarse race group: for example, both Indian and Chinese patients are typically coded as "Asian." It is unknown, however, whether this coarse coding conceals meaningful disparities in the performance of clinical risk scores across granular race groups. Here we show that it does. Using data from 418K emergency department visits, we assess clinical risk score performance disparities across 26 granular groups for three outcomes, five risk scores, and four performance metrics. Across outcomes and metrics, we show that the risk scores exhibit significant granular performance disparities within coarse race groups. In fact, variation in performance within coarse groups often *exceeds* the variation between coarse groups. We explore why these disparities arise, finding that outcome rates, feature distributions, and the relationships between features and outcomes all vary significantly across granular groups. Our results suggest that healthcare providers, hospital systems, and machine learning researchers should strive to collect, release, and use granular race data in place of coarse race data, and that existing analyses may significantly underestimate racial disparities in performance. △ Less

Submitted 24 August, 2023; v1 submitted 18 April, 2023; originally announced April 2023.

Comments: Published at MLHC 2023. v2 includes minor changes from the camera-ready, such as a link to code. Code is available at https://github.com/rmovva/granular-race-disparities_MLHC23

ACM Class: J.3; K.4.2

arXiv:2304.07645 [pdf, other]

Magnitude Invariant Parametrizations Improve Hypernetwork Learning

Authors: Jose Javier Gonzalez Ortiz, John Guttag, Adrian Dalca

Abstract: Hypernetworks, neural networks that predict the parameters of another neural network, are powerful models that have been successfully used in diverse applications from image generation to multi-task learning. Unfortunately, existing hypernetworks are often challenging to train. Training typically converges far more slowly than for non-hypernetwork models, and the rate of convergence can be very se… ▽ More Hypernetworks, neural networks that predict the parameters of another neural network, are powerful models that have been successfully used in diverse applications from image generation to multi-task learning. Unfortunately, existing hypernetworks are often challenging to train. Training typically converges far more slowly than for non-hypernetwork models, and the rate of convergence can be very sensitive to hyperparameter choices. In this work, we identify a fundamental and previously unidentified problem that contributes to the challenge of training hypernetworks: a magnitude proportionality between the inputs and outputs of the hypernetwork. We demonstrate both analytically and empirically that this can lead to unstable optimization, thereby slowing down convergence, and sometimes even preventing any learning. We present a simple solution to this problem using a revised hypernetwork formulation that we call Magnitude Invariant Parametrizations (MIP). We demonstrate the proposed solution on several hypernetwork tasks, where it consistently stabilizes training and achieves faster convergence. Furthermore, we perform a comprehensive ablation study including choices of activation function, normalization strategies, input dimensionality, and hypernetwork architecture; and find that MIP improves training in all scenarios. We provide easy-to-use code that can turn existing networks into MIP-based hypernetworks. △ Less

Submitted 29 June, 2023; v1 submitted 15 April, 2023; originally announced April 2023.

Comments: Source code at https://github.com/JJGO/hyperlight

arXiv:2304.06131 [pdf, other]

UniverSeg: Universal Medical Image Segmentation

Authors: Victor Ion Butoi, Jose Javier Gonzalez Ortiz, Tianyu Ma, Mert R. Sabuncu, John Guttag, Adrian V. Dalca

Abstract: While deep learning models have become the predominant method for medical image segmentation, they are typically not capable of generalizing to unseen segmentation tasks involving new anatomies, image modalities, or labels. Given a new segmentation task, researchers generally have to train or fine-tune models, which is time-consuming and poses a substantial barrier for clinical researchers, who of… ▽ More While deep learning models have become the predominant method for medical image segmentation, they are typically not capable of generalizing to unseen segmentation tasks involving new anatomies, image modalities, or labels. Given a new segmentation task, researchers generally have to train or fine-tune models, which is time-consuming and poses a substantial barrier for clinical researchers, who often lack the resources and expertise to train neural networks. We present UniverSeg, a method for solving unseen medical segmentation tasks without additional training. Given a query image and example set of image-label pairs that define a new segmentation task, UniverSeg employs a new Cross-Block mechanism to produce accurate segmentation maps without the need for additional training. To achieve generalization to new tasks, we have gathered and standardized a collection of 53 open-access medical segmentation datasets with over 22,000 scans, which we refer to as MegaMedical. We used this collection to train UniverSeg on a diverse set of anatomies and imaging modalities. We demonstrate that UniverSeg substantially outperforms several related methods on unseen tasks, and thoroughly analyze and draw insights about important aspects of the proposed system. The UniverSeg source code and model weights are freely available at https://universeg.csail.mit.edu △ Less

Submitted 12 April, 2023; originally announced April 2023.

Comments: Victor and Jose Javier contributed equally to this work. Project Website: https://universeg.csail.mit.edu

arXiv:2304.05448 [pdf, other]

Scale-Space Hypernetworks for Efficient Biomedical Imaging

Authors: Jose Javier Gonzalez Ortiz, John Guttag, Adrian Dalca

Abstract: Convolutional Neural Networks (CNNs) are the predominant model used for a variety of medical image analysis tasks. At inference time, these models are computationally intensive, especially with volumetric data. In principle, it is possible to trade accuracy for computational efficiency by manipulating the rescaling factor in the downsample and upsample layers of CNN architectures. However, properl… ▽ More Convolutional Neural Networks (CNNs) are the predominant model used for a variety of medical image analysis tasks. At inference time, these models are computationally intensive, especially with volumetric data. In principle, it is possible to trade accuracy for computational efficiency by manipulating the rescaling factor in the downsample and upsample layers of CNN architectures. However, properly exploring the accuracy-efficiency trade-off is prohibitively expensive with existing models. To address this, we introduce Scale-Space HyperNetworks (SSHN), a method that learns a spectrum of CNNs with varying internal rescaling factors. A single SSHN characterizes an entire Pareto accuracy-efficiency curve of models that match, and occasionally surpass, the outcomes of training many separate networks with fixed rescaling factors. We demonstrate the proposed approach in several medical image analysis applications, comparing SSHN against strategies with both fixed and dynamic rescaling factors. We find that SSHN consistently provides a better accuracy-efficiency trade-off at a fraction of the training cost. Trained SSHNs enable the user to quickly choose a rescaling factor that appropriately balances accuracy and computational efficiency for their particular needs at inference. △ Less

Submitted 29 June, 2023; v1 submitted 11 April, 2023; originally announced April 2023.

Comments: Code available at https://github.com/JJGO/scale-space-hypernetworks

arXiv:2211.02892 [pdf, other]

SizeGAN: Improving Size Representation in Clothing Catalogs

Authors: Kathleen M. Lewis, John Guttag

Abstract: Online clothing catalogs lack diversity in body shape and garment size. Brands commonly display their garments on models of one or two sizes, rarely including plus-size models. To our knowledge, our paper presents the first method for generating images of garments and models in a new target size to tackle the size under-representation problem. Our primary technical contribution is a conditional ge… ▽ More Online clothing catalogs lack diversity in body shape and garment size. Brands commonly display their garments on models of one or two sizes, rarely including plus-size models. To our knowledge, our paper presents the first method for generating images of garments and models in a new target size to tackle the size under-representation problem. Our primary technical contribution is a conditional generative adversarial network that learns deformation fields at multiple resolutions to realistically change the size of models and garments. Results from our two user studies show SizeGAN outperforms alternative methods along three dimensions -- realism, garment faithfulness, and size -- which are all important for real world use. △ Less

Submitted 26 June, 2023; v1 submitted 5 November, 2022; originally announced November 2022.

arXiv:2207.04312 [pdf, other]

At the Intersection of Deep Learning and Conceptual Art: The End of Signature

Authors: Divya Shanmugam, Katie Lewis, Jose Javier Gonzalez-Ortiz, Agnieszka Kurant, John Guttag

Abstract: MIT wanted to commission a large scale artwork that would serve to 'illuminate a new campus gateway, inaugurate a space of exchange between MIT and Cambridge, and inspire our students, faculty, visitors, and the surrounding community to engage with art in new ways and to have art be part of their daily lives.' Among other things, the art was to reflect the fact that scientific discovery is often t… ▽ More MIT wanted to commission a large scale artwork that would serve to 'illuminate a new campus gateway, inaugurate a space of exchange between MIT and Cambridge, and inspire our students, faculty, visitors, and the surrounding community to engage with art in new ways and to have art be part of their daily lives.' Among other things, the art was to reflect the fact that scientific discovery is often the result of many individual contributions, both acknowledged and unacknowledged. In this work, a group of computer scientists collaborated with a conceptual artist to produce a collective signature, or a signature learned from contributions of an entire community. After collecting signatures from two communities -- the university, and the surrounding city -- the computer scientists developed generative models and a human-in-the-loop feedback process to work with the artist create an original signature-like structure representative of each community. These signatures are now large-scale steel, LED and neon light sculptures that appear to sign two new buildings in Cambridge, MA. △ Less

Submitted 9 July, 2022; originally announced July 2022.

arXiv:2206.13607 [pdf, other]

Improved Text Classification via Test-Time Augmentation

Authors: Helen Lu, Divya Shanmugam, Harini Suresh, John Guttag

Abstract: Test-time augmentation -- the aggregation of predictions across transformed examples of test inputs -- is an established technique to improve the performance of image classification models. Importantly, TTA can be used to improve model performance post-hoc, without additional training. Although test-time augmentation (TTA) can be applied to any data modality, it has seen limited adoption in NLP du… ▽ More Test-time augmentation -- the aggregation of predictions across transformed examples of test inputs -- is an established technique to improve the performance of image classification models. Importantly, TTA can be used to improve model performance post-hoc, without additional training. Although test-time augmentation (TTA) can be applied to any data modality, it has seen limited adoption in NLP due in part to the difficulty of identifying label-preserving transformations. In this paper, we present augmentation policies that yield significant accuracy improvements with language models. A key finding is that augmentation policy design -- for instance, the number of samples generated from a single, non-deterministic augmentation -- has a considerable impact on the benefit of TTA. Experiments across a binary classification task and dataset show that test-time augmentation can deliver consistent improvements over current state-of-the-art approaches. △ Less

Submitted 27 June, 2022; originally announced June 2022.

arXiv:2206.02958 [pdf, other]

doi 10.1145/3593013.3593997

Saliency Cards: A Framework to Characterize and Compare Saliency Methods

Authors: Angie Boggust, Harini Suresh, Hendrik Strobelt, John V. Guttag, Arvind Satyanarayan

Abstract: Saliency methods are a common class of machine learning interpretability techniques that calculate how important each input feature is to a model's output. We find that, with the rapid pace of development, users struggle to stay informed of the strengths and limitations of new methods and, thus, choose methods for unprincipled reasons (e.g., popularity). Moreover, despite a corresponding rise in e… ▽ More Saliency methods are a common class of machine learning interpretability techniques that calculate how important each input feature is to a model's output. We find that, with the rapid pace of development, users struggle to stay informed of the strengths and limitations of new methods and, thus, choose methods for unprincipled reasons (e.g., popularity). Moreover, despite a corresponding rise in evaluation metrics, existing approaches assume universal desiderata for saliency methods (e.g., faithfulness) that do not account for diverse user needs. In response, we introduce saliency cards: structured documentation of how saliency methods operate and their performance across a battery of evaluative metrics. Through a review of 25 saliency method papers and 33 method evaluations, we identify 10 attributes that users should account for when choosing a method. We group these attributes into three categories that span the process of computing and interpreting saliency: methodology, or how the saliency is calculated; sensitivity, or the relationship between the saliency and the underlying model and data; and, perceptibility, or how an end user ultimately interprets the result. By collating this information, saliency cards allow users to more holistically assess and compare the implications of different methods. Through nine semi-structured interviews with users from various backgrounds, including researchers, radiologists, and computational biologists, we find that saliency cards provide a detailed vocabulary for discussing individual methods and allow for a more systematic selection of task-appropriate methods. Moreover, with saliency cards, we are able to analyze the research landscape in a more structured fashion to identify opportunities for new methods and evaluation metrics for unmet user needs. △ Less

Submitted 30 May, 2023; v1 submitted 6 June, 2022; originally announced June 2022.

Comments: Published at FAccT 2023, 19 pages, 8 figures, 2 tables

arXiv:2204.04360 [pdf, other]

Data Augmentation for Electrocardiograms

Authors: Aniruddh Raghu, Divya Shanmugam, Eugene Pomerantsev, John Guttag, Collin M. Stultz

Abstract: Neural network models have demonstrated impressive performance in predicting pathologies and outcomes from the 12-lead electrocardiogram (ECG). However, these models often need to be trained with large, labelled datasets, which are not available for many predictive tasks of interest. In this work, we perform an empirical study examining whether training time data augmentation methods can be used t… ▽ More Neural network models have demonstrated impressive performance in predicting pathologies and outcomes from the 12-lead electrocardiogram (ECG). However, these models often need to be trained with large, labelled datasets, which are not available for many predictive tasks of interest. In this work, we perform an empirical study examining whether training time data augmentation methods can be used to improve performance on such data-scarce ECG prediction problems. We investigate how data augmentation strategies impact model performance when detecting cardiac abnormalities from the ECG. Motivated by our finding that the effectiveness of existing augmentation strategies is highly task-dependent, we introduce a new method, TaskAug, which defines a flexible augmentation policy that is optimized on a per-task basis. We outline an efficient learning algorithm to do so that leverages recent work in nested optimization and implicit differentiation. In experiments, considering three datasets and eight predictive tasks, we find that TaskAug is competitive with or improves on prior work, and the learned policies shed light on what transformations are most effective for different tasks. We distill key insights from our experimental evaluation, generating a set of best practices for applying data augmentation to ECG prediction problems. △ Less

Submitted 8 April, 2022; originally announced April 2022.

Comments: Conference on Health, Inference, and Learning (CHIL) 2022

arXiv:2203.16680 [pdf, other]

doi 10.59275/j.melba.2022-74f1

Learning the Effect of Registration Hyperparameters with HyperMorph

Authors: Andrew Hoopes, Malte Hoffmann, Douglas N. Greve, Bruce Fischl, John Guttag, Adrian V. Dalca

Abstract: We introduce HyperMorph, a framework that facilitates efficient hyperparameter tuning in learning-based deformable image registration. Classical registration algorithms perform an iterative pair-wise optimization to compute a deformation field that aligns two images. Recent learning-based approaches leverage large image datasets to learn a function that rapidly estimates a deformation for a given… ▽ More We introduce HyperMorph, a framework that facilitates efficient hyperparameter tuning in learning-based deformable image registration. Classical registration algorithms perform an iterative pair-wise optimization to compute a deformation field that aligns two images. Recent learning-based approaches leverage large image datasets to learn a function that rapidly estimates a deformation for a given image pair. In both strategies, the accuracy of the resulting spatial correspondences is strongly influenced by the choice of certain hyperparameter values. However, an effective hyperparameter search consumes substantial time and human effort as it often involves training multiple models for different fixed hyperparameter values and may lead to suboptimal registration. We propose an amortized hyperparameter learning strategy to alleviate this burden by learning the impact of hyperparameters on deformation fields. We design a meta network, or hypernetwork, that predicts the parameters of a registration network for input hyperparameters, thereby comprising a single model that generates the optimal deformation field corresponding to given hyperparameter values. This strategy enables fast, high-resolution hyperparameter search at test-time, reducing the inefficiency of traditional approaches while increasing flexibility. We also demonstrate additional benefits of HyperMorph, including enhanced robustness to model initialization and the ability to rapidly identify optimal hyperparameter values specific to a dataset, image contrast, task, or even anatomical region, all without the need to retrain models. We make our code publicly available at http://hypermorph.voxelmorph.net. △ Less

Submitted 30 March, 2022; originally announced March 2022.

Comments: Accepted for publication at the Journal of Machine Learning for Biomedical Imaging (MELBA) at https://www.melba-journal.org

arXiv:2106.10860 [pdf, other]

Multiplying Matrices Without Multiplying

Authors: Davis Blalock, John Guttag

Abstract: Multiplying matrices is among the most fundamental and compute-intensive operations in machine learning. Consequently, there has been significant work on efficiently approximating matrix multiplies. We introduce a learning-based algorithm for this task that greatly outperforms existing methods. Experiments using hundreds of matrices from diverse domains show that it often runs $100\times$ faster t… ▽ More Multiplying matrices is among the most fundamental and compute-intensive operations in machine learning. Consequently, there has been significant work on efficiently approximating matrix multiplies. We introduce a learning-based algorithm for this task that greatly outperforms existing methods. Experiments using hundreds of matrices from diverse domains show that it often runs $100\times$ faster than exact matrix products and $10\times$ faster than current approximate methods. In the common case that one matrix is known ahead of time, our method also has the interesting property that it requires zero multiply-adds. These results suggest that a mixture of hashing, averaging, and byte shuffling$-$the core operations of our method$-$could be a more promising building block for machine learning than the sparsified, factorized, and/or scalar quantized matrix products that have recently been the focus of substantial research and hardware investment. △ Less

Submitted 21 June, 2021; originally announced June 2021.

Comments: To appear at ICML 2021

Journal ref: PMLR 139:992-1004, 2021

arXiv:2103.02768 [pdf, other]

Learning to Predict with Supporting Evidence: Applications to Clinical Risk Prediction

Authors: Aniruddh Raghu, John Guttag, Katherine Young, Eugene Pomerantsev, Adrian V. Dalca, Collin M. Stultz

Abstract: The impact of machine learning models on healthcare will depend on the degree of trust that healthcare professionals place in the predictions made by these models. In this paper, we present a method to provide people with clinical expertise with domain-relevant evidence about why a prediction should be trusted. We first design a probabilistic model that relates meaningful latent concepts to predic… ▽ More The impact of machine learning models on healthcare will depend on the degree of trust that healthcare professionals place in the predictions made by these models. In this paper, we present a method to provide people with clinical expertise with domain-relevant evidence about why a prediction should be trusted. We first design a probabilistic model that relates meaningful latent concepts to prediction targets and observed data. Inference of latent variables in this model corresponds to both making a prediction and providing supporting evidence for that prediction. We present a two-step process to efficiently approximate inference: (i) estimating model parameters using variational learning, and (ii) approximating maximum a posteriori estimation of latent variables in the model using a neural network, trained with an objective derived from the probabilistic model. We demonstrate the method on the task of predicting mortality risk for patients with cardiovascular disease. Specifically, using electrocardiogram and tabular data as input, we show that our approach provides appropriate domain-relevant supporting evidence for accurate predictions. △ Less

Submitted 3 March, 2021; originally announced March 2021.

Comments: ACM Conference on Health, Learning, and Inference 2021

arXiv:2102.08540 [pdf, other]

Intuitively Assessing ML Model Reliability through Example-Based Explanations and Editing Model Inputs

Authors: Harini Suresh, Kathleen M. Lewis, John V. Guttag, Arvind Satyanarayan

Abstract: Interpretability methods aim to help users build trust in and understand the capabilities of machine learning models. However, existing approaches often rely on abstract, complex visualizations that poorly map to the task at hand or require non-trivial ML expertise to interpret. Here, we present two visual analytics modules that facilitate an intuitive assessment of model reliability. To help user… ▽ More Interpretability methods aim to help users build trust in and understand the capabilities of machine learning models. However, existing approaches often rely on abstract, complex visualizations that poorly map to the task at hand or require non-trivial ML expertise to interpret. Here, we present two visual analytics modules that facilitate an intuitive assessment of model reliability. To help users better characterize and reason about a model's uncertainty, we visualize raw and aggregate information about a given input's nearest neighbors. Using an interactive editor, users can manipulate this input in semantically-meaningful ways, determine the effect on the output, and compare against their prior expectations. We evaluate our interface using an electrocardiogram beat classification case study. Compared to a baseline feature importance interface, we find that 14 physicians are better able to align the model's uncertainty with domain-relevant factors and build intuition about its capabilities and limitations. △ Less

Submitted 9 July, 2021; v1 submitted 16 February, 2021; originally announced February 2021.

arXiv:2101.01035 [pdf, other]

HyperMorph: Amortized Hyperparameter Learning for Image Registration

Authors: Andrew Hoopes, Malte Hoffmann, Bruce Fischl, John Guttag, Adrian V. Dalca

Abstract: We present HyperMorph, a learning-based strategy for deformable image registration that removes the need to tune important registration hyperparameters during training. Classical registration methods solve an optimization problem to find a set of spatial correspondences between two images, while learning-based methods leverage a training dataset to learn a function that generates these corresponde… ▽ More We present HyperMorph, a learning-based strategy for deformable image registration that removes the need to tune important registration hyperparameters during training. Classical registration methods solve an optimization problem to find a set of spatial correspondences between two images, while learning-based methods leverage a training dataset to learn a function that generates these correspondences. The quality of the results for both types of techniques depends greatly on the choice of hyperparameters. Unfortunately, hyperparameter tuning is time-consuming and typically involves training many separate models with various hyperparameter values, potentially leading to suboptimal results. To address this inefficiency, we introduce amortized hyperparameter learning for image registration, a novel strategy to learn the effects of hyperparameters on deformation fields. The proposed framework learns a hypernetwork that takes in an input hyperparameter and modulates a registration network to produce the optimal deformation field for that hyperparameter value. In effect, this strategy trains a single, rich model that enables rapid, fine-grained discovery of hyperparameter values from a continuous interval at test-time. We demonstrate that this approach can be used to optimize multiple hyperparameters considerably faster than existing search strategies, leading to a reduced computational and human burden as well as increased flexibility. We also show several important benefits, including increased robustness to initialization and the ability to rapidly identify optimal hyperparameter values specific to a registration task, dataset, or even a single anatomical region, all without retraining the HyperMorph model. Our code is publicly available at http://voxelmorph.mit.edu. △ Less

Submitted 4 May, 2021; v1 submitted 4 January, 2021; originally announced January 2021.

Comments: IPMI 2021: Information Processing in Medical Imaging. Keywords: Deformable Image Registration, Hyperparameter Search, Deep Learning, Hypernetworks, and Amortized Learning

arXiv:2011.11156 [pdf, other]

Better Aggregation in Test-Time Augmentation

Authors: Divya Shanmugam, Davis Blalock, Guha Balakrishnan, John Guttag

Abstract: Test-time augmentation -- the aggregation of predictions across transformed versions of a test input -- is a common practice in image classification. Traditionally, predictions are combined using a simple average. In this paper, we present 1) experimental analyses that shed light on cases in which the simple average is suboptimal and 2) a method to address these shortcomings. A key finding is that… ▽ More Test-time augmentation -- the aggregation of predictions across transformed versions of a test input -- is a common practice in image classification. Traditionally, predictions are combined using a simple average. In this paper, we present 1) experimental analyses that shed light on cases in which the simple average is suboptimal and 2) a method to address these shortcomings. A key finding is that even when test-time augmentation produces a net improvement in accuracy, it can change many correct predictions into incorrect predictions. We delve into when and why test-time augmentation changes a prediction from being correct to incorrect and vice versa. Building on these insights, we present a learning-based method for aggregating test-time augmentations. Experiments across a diverse set of models, datasets, and augmentations show that our method delivers consistent improvements over existing approaches. △ Less

Submitted 11 October, 2021; v1 submitted 22 November, 2020; originally announced November 2020.

Journal ref: ICCV 2021

arXiv:2007.10233 [pdf, other]

Unsupervised Domain Adaptation in the Absence of Source Data

Authors: Roshni Sahoo, Divya Shanmugam, John Guttag

Abstract: Current unsupervised domain adaptation methods can address many types of distribution shift, but they assume data from the source domain is freely available. As the use of pre-trained models becomes more prevalent, it is reasonable to assume that source data is unavailable. We propose an unsupervised method for adapting a source classifier to a target domain that varies from the source domain alon… ▽ More Current unsupervised domain adaptation methods can address many types of distribution shift, but they assume data from the source domain is freely available. As the use of pre-trained models becomes more prevalent, it is reasonable to assume that source data is unavailable. We propose an unsupervised method for adapting a source classifier to a target domain that varies from the source domain along natural axes, such as brightness and contrast. Our method only requires access to unlabeled target instances and the source classifier. We validate our method in scenarios where the distribution shift involves brightness, contrast, and rotation and show that it outperforms fine-tuning baselines in scenarios with limited labeled data. △ Less

Submitted 20 July, 2020; originally announced July 2020.

arXiv:2006.00090 [pdf, other]

Anatomical Predictions using Subject-Specific Medical Data

Authors: Marianne Rakic, John Guttag, Adrian V. Dalca

Abstract: Changes over time in brain anatomy can provide important insight for treatment design or scientific analyses. We present a method that predicts how a brain MRI for an individual will change over time. We model changes using a diffeomorphic deformation field that we predict using function using convolutional neural networks. Given a predicted deformation field, a baseline scan can be warped to give… ▽ More Changes over time in brain anatomy can provide important insight for treatment design or scientific analyses. We present a method that predicts how a brain MRI for an individual will change over time. We model changes using a diffeomorphic deformation field that we predict using function using convolutional neural networks. Given a predicted deformation field, a baseline scan can be warped to give a prediction of the brain scan at a future time. We demonstrate the method using the ADNI cohort, and analyze how performance is affected by model variants and the subject-specific information provided. We show that the model provides good predictions and that external clinical data can improve predictions. △ Less

Submitted 29 May, 2020; originally announced June 2020.

Comments: Accepted as a short paper to MIDL2020. Keywords: Medical Imaging, Multi-Modal, Prediction

Report number: MIDL/2020/ExtendedAbstract/apwZYLKTCo

arXiv:2003.03033 [pdf, other]

What is the State of Neural Network Pruning?

Authors: Davis Blalock, Jose Javier Gonzalez Ortiz, Jonathan Frankle, John Guttag

Abstract: Neural network pruning---the task of reducing the size of a network by removing parameters---has been the subject of a great deal of work in recent years. We provide a meta-analysis of the literature, including an overview of approaches to pruning and consistent findings in the literature. After aggregating results across 81 papers and pruning hundreds of models in controlled conditions, our clear… ▽ More Neural network pruning---the task of reducing the size of a network by removing parameters---has been the subject of a great deal of work in recent years. We provide a meta-analysis of the literature, including an overview of approaches to pruning and consistent findings in the literature. After aggregating results across 81 papers and pruning hundreds of models in controlled conditions, our clearest finding is that the community suffers from a lack of standardized benchmarks and metrics. This deficiency is substantial enough that it is hard to compare pruning techniques to one another or determine how much progress the field has made over the past three decades. To address this situation, we identify issues with current practices, suggest concrete remedies, and introduce ShrinkBench, an open-source framework to facilitate standardized evaluations of pruning methods. We use ShrinkBench to compare various pruning techniques and show that its comprehensive evaluation can prevent common pitfalls when comparing pruning methods. △ Less

Submitted 6 March, 2020; originally announced March 2020.

Comments: Published in Proceedings of Machine Learning and Systems 2020 (MLSys 2020)

arXiv:2001.01026 [pdf, other]

Painting Many Pasts: Synthesizing Time Lapse Videos of Paintings

Authors: Amy Zhao, Guha Balakrishnan, Kathleen M. Lewis, Frédo Durand, John V. Guttag, Adrian V. Dalca

Abstract: We introduce a new video synthesis task: synthesizing time lapse videos depicting how a given painting might have been created. Artists paint using unique combinations of brushes, strokes, and colors. There are often many possible ways to create a given painting. Our goal is to learn to capture this rich range of possibilities. Creating distributions of long-term videos is a challenge for learni… ▽ More We introduce a new video synthesis task: synthesizing time lapse videos depicting how a given painting might have been created. Artists paint using unique combinations of brushes, strokes, and colors. There are often many possible ways to create a given painting. Our goal is to learn to capture this rich range of possibilities. Creating distributions of long-term videos is a challenge for learning-based video synthesis methods. We present a probabilistic model that, given a single image of a completed painting, recurrently synthesizes steps of the painting process. We implement this model as a convolutional neural network, and introduce a novel training scheme to enable learning from a limited dataset of painting time lapses. We demonstrate that this model can be used to sample many time steps, enabling long-term stochastic video synthesis. We evaluate our method on digital and watercolor paintings collected from video websites, and show that human raters find our synthetic videos to be similar to time lapse videos produced by real artists. Our code is available at https://xamyzhao.github.io/timecraft. △ Less

Submitted 25 April, 2020; v1 submitted 3 January, 2020; originally announced January 2020.

Comments: 10 pages, CVPR 2020

arXiv:1912.00262 [pdf, other]

Image segmentation of liver stage malaria infection with spatial uncertainty sampling

Authors: Ava P. Soleimany, Harini Suresh, Jose Javier Gonzalez Ortiz, Divya Shanmugam, Nil Gural, John Guttag, Sangeeta N. Bhatia

Abstract: Global eradication of malaria depends on the development of drugs effective against the silent, yet obligate liver stage of the disease. The gold standard in drug development remains microscopic imaging of liver stage parasites in in vitro cell culture models. Image analysis presents a major bottleneck in this pipeline since the parasite has significant variability in size, shape, and density in t… ▽ More Global eradication of malaria depends on the development of drugs effective against the silent, yet obligate liver stage of the disease. The gold standard in drug development remains microscopic imaging of liver stage parasites in in vitro cell culture models. Image analysis presents a major bottleneck in this pipeline since the parasite has significant variability in size, shape, and density in these models. As with other highly variable datasets, traditional segmentation models have poor generalizability as they rely on hand-crafted features; thus, manual annotation of liver stage malaria images remains standard. To address this need, we develop a convolutional neural network architecture that utilizes spatial dropout sampling for parasite segmentation and epistemic uncertainty estimation in images of liver stage malaria. Our pipeline produces high-precision segmentations nearly identical to expert annotations, generalizes well on a diverse dataset of liver stage malaria parasites, and promotes independence between learned feature maps to model the uncertainty of generated predictions. △ Less

Submitted 30 November, 2019; originally announced December 2019.

arXiv:1910.04817 [pdf, other]

Estimation of Bounds on Potential Outcomes For Decision Making

Authors: Maggie Makar, Fredrik D. Johansson, John Guttag, David Sontag

Abstract: Estimation of individual treatment effects is commonly used as the basis for contextual decision making in fields such as healthcare, education, and economics. However, it is often sufficient for the decision maker to have estimates of upper and lower bounds on the potential outcomes of decision alternatives to assess risks and benefits. We show that, in such cases, we can improve sample efficienc… ▽ More Estimation of individual treatment effects is commonly used as the basis for contextual decision making in fields such as healthcare, education, and economics. However, it is often sufficient for the decision maker to have estimates of upper and lower bounds on the potential outcomes of decision alternatives to assess risks and benefits. We show that, in such cases, we can improve sample efficiency by estimating simple functions that bound these outcomes instead of estimating their conditional expectations, which may be complex and hard to estimate. Our analysis highlights a trade-off between the complexity of the learning task and the confidence with which the learned bounds hold. Guided by these findings, we develop an algorithm for learning upper and lower bounds on potential outcomes which optimize an objective function defined by the decision maker, subject to the probability that bounds are violated being small. Using a clinical dataset and a well-known causality benchmark, we demonstrate that our algorithm outperforms baselines, providing tighter, more reliable bounds. △ Less

Submitted 12 August, 2020; v1 submitted 10 October, 2019; originally announced October 2019.

Journal ref: ICML 2020

arXiv:1909.00475 [pdf, other]

Visual Deprojection: Probabilistic Recovery of Collapsed Dimensions

Authors: Guha Balakrishnan, Adrian V. Dalca, Amy Zhao, John V. Guttag, Fredo Durand, William T. Freeman

Abstract: We introduce visual deprojection: the task of recovering an image or video that has been collapsed along a dimension. Projections arise in various contexts, such as long-exposure photography, where a dynamic scene is collapsed in time to produce a motion-blurred image, and corner cameras, where reflected light from a scene is collapsed along a spatial dimension because of an edge occluder to yield… ▽ More We introduce visual deprojection: the task of recovering an image or video that has been collapsed along a dimension. Projections arise in various contexts, such as long-exposure photography, where a dynamic scene is collapsed in time to produce a motion-blurred image, and corner cameras, where reflected light from a scene is collapsed along a spatial dimension because of an edge occluder to yield a 1D video. Deprojection is ill-posed-- often there are many plausible solutions for a given input. We first propose a probabilistic model capturing the ambiguity of the task. We then present a variational inference strategy using convolutional neural networks as functional approximators. Sampling from the inference network at test time yields plausible candidates from the distribution of original signals that are consistent with a given input projection. We evaluate the method on several datasets for both spatial and temporal deprojection tasks. We first demonstrate the method can recover human gait videos and face images from spatial projections, and then show that it can recover videos of moving digits from dramatically motion-blurred images obtained via temporal projection. △ Less

Submitted 1 September, 2019; originally announced September 2019.

Comments: ICCV 2019

arXiv:1908.02738 [pdf, other]

Learning Conditional Deformable Templates with Convolutional Networks

Authors: Adrian V. Dalca, Marianne Rakic, John Guttag, Mert R. Sabuncu

Abstract: We develop a learning framework for building deformable templates, which play a fundamental role in many image analysis and computational anatomy tasks. Conventional methods for template creation and image alignment to the template have undergone decades of rich technical development. In these frameworks, templates are constructed using an iterative process of template estimation and alignment, wh… ▽ More We develop a learning framework for building deformable templates, which play a fundamental role in many image analysis and computational anatomy tasks. Conventional methods for template creation and image alignment to the template have undergone decades of rich technical development. In these frameworks, templates are constructed using an iterative process of template estimation and alignment, which is often computationally very expensive. Due in part to this shortcoming, most methods compute a single template for the entire population of images, or a few templates for specific sub-groups of the data. In this work, we present a probabilistic model and efficient learning strategy that yields either universal or conditional templates, jointly with a neural network that provides efficient alignment of the images to these templates. We demonstrate the usefulness of this method on a variety of domains, with a special focus on neuroimaging. This is particularly useful for clinical applications where a pre-existing template does not exist, or creating a new one with traditional methods can be prohibitively expensive. Our code and atlases are available online as part of the VoxelMorph library at http://voxelmorph.csail.mit.edu. △ Less

Submitted 11 October, 2019; v1 submitted 7 August, 2019; originally announced August 2019.

Comments: NeurIPS 2019: Neural Information Processing Systems. Keywords: deformable templates, conditional atlases, diffeomorphic image registration, probabilistic models, neuroimaging

Journal ref: NeurIPS: Thirty-third Conference on Neural Information Processing Systems, 2019

arXiv:1903.03545 [pdf, other]

doi 10.1016/j.media.2019.07.006

Unsupervised Learning of Probabilistic Diffeomorphic Registration for Images and Surfaces

Authors: Adrian V. Dalca, Guha Balakrishnan, John Guttag, Mert R. Sabuncu

Abstract: Classical deformable registration techniques achieve impressive results and offer a rigorous theoretical treatment, but are computationally intensive since they solve an optimization problem for each image pair. Recently, learning-based methods have facilitated fast registration by learning spatial deformation functions. However, these approaches use restricted deformation models, require supervis… ▽ More Classical deformable registration techniques achieve impressive results and offer a rigorous theoretical treatment, but are computationally intensive since they solve an optimization problem for each image pair. Recently, learning-based methods have facilitated fast registration by learning spatial deformation functions. However, these approaches use restricted deformation models, require supervised labels, or do not guarantee a diffeomorphic (topology-preserving) registration. Furthermore, learning-based registration tools have not been derived from a probabilistic framework that can offer uncertainty estimates. In this paper, we build a connection between classical and learning-based methods. We present a probabilistic generative model and derive an unsupervised learning-based inference algorithm that uses insights from classical registration methods and makes use of recent developments in convolutional neural networks (CNNs). We demonstrate our method on a 3D brain registration task for both images and anatomical surfaces, and provide extensive empirical analyses. Our principled approach results in state of the art accuracy and very fast runtimes, while providing diffeomorphic guarantees. Our implementation is available at http://voxelmorph.csail.mit.edu. △ Less

Submitted 23 July, 2019; v1 submitted 8 March, 2019; originally announced March 2019.

Comments: MedIA: Medical Image Analysis (MICCAI2018 Special Issue). Expands on MICCAI 2018 paper (arXiv:1805.04605) by introducing an extension to anatomical surface registration, new experiments, and analysis of diffeomorphic implementations. Keywords: medical image registration; diffeomorphic; invertible; probabilistic modeling; variational inference. Code available at http://voxelmorph.csail.mit.edu. arXiv admin note: text overlap with arXiv:1805.04605

arXiv:1903.03503 [pdf, other]

Unsupervised Data Imputation via Variational Inference of Deep Subspaces

Authors: Adrian V. Dalca, John Guttag, Mert R. Sabuncu

Abstract: A wide range of systems exhibit high dimensional incomplete data. Accurate estimation of the missing data is often desired, and is crucial for many downstream analyses. Many state-of-the-art recovery methods involve supervised learning using datasets containing full observations. In contrast, we focus on unsupervised estimation of missing image data, where no full observations are available - a co… ▽ More A wide range of systems exhibit high dimensional incomplete data. Accurate estimation of the missing data is often desired, and is crucial for many downstream analyses. Many state-of-the-art recovery methods involve supervised learning using datasets containing full observations. In contrast, we focus on unsupervised estimation of missing image data, where no full observations are available - a common situation in practice. Unsupervised imputation methods for images often employ a simple linear subspace to capture correlations between data dimensions, omitting more complex relationships. In this work, we introduce a general probabilistic model that describes sparse high dimensional imaging data as being generated by a deep non-linear embedding. We derive a learning algorithm using a variational approximation based on convolutional neural networks and discuss its relationship to linear imputation models, the variational auto encoder, and deep image priors. We introduce sparsity-aware network building blocks that explicitly model observed and missing data. We analyze proposed sparsity-aware network building blocks, evaluate our method on public domain imaging datasets, and conclude by showing that our method enables imputation in an important real-world problem involving medical images. The code is freely available as part of the \verb|neuron| library at http://github.com/adalca/neuron. △ Less

Submitted 8 March, 2019; originally announced March 2019.

arXiv:1903.03148 [pdf, other]

doi 10.1109/CVPR.2018.00968

Anatomical Priors in Convolutional Networks for Unsupervised Biomedical Segmentation

Authors: Adrian V. Dalca, John Guttag, Mert R. Sabuncu

Abstract: We consider the problem of segmenting a biomedical image into anatomical regions of interest. We specifically address the frequent scenario where we have no paired training data that contains images and their manual segmentations. Instead, we employ unpaired segmentation images to build an anatomical prior. Critically these segmentations can be derived from imaging data from a different dataset an… ▽ More We consider the problem of segmenting a biomedical image into anatomical regions of interest. We specifically address the frequent scenario where we have no paired training data that contains images and their manual segmentations. Instead, we employ unpaired segmentation images to build an anatomical prior. Critically these segmentations can be derived from imaging data from a different dataset and imaging modality than the current task. We introduce a generative probabilistic model that employs the learned prior through a convolutional neural network to compute segmentations in an unsupervised setting. We conducted an empirical analysis of the proposed approach in the context of structural brain MRI segmentation, using a multi-study dataset of more than 14,000 scans. Our results show that an anatomical prior can enable fast unsupervised segmentation which is typically not possible using standard convolutional networks. The integration of anatomical priors can facilitate CNN-based anatomical segmentation in a range of novel clinical problems, where few or no annotations are available and thus standard networks are not trainable. The code is freely available at http://github.com/adalca/neuron. △ Less

Submitted 7 March, 2019; originally announced March 2019.

Comments: Presented at CVPR 2018. IEEE CVPR proceedings pp. 9290-9299

arXiv:1902.09383 [pdf, other]

Data augmentation using learned transformations for one-shot medical image segmentation

Authors: Amy Zhao, Guha Balakrishnan, Frédo Durand, John V. Guttag, Adrian V. Dalca

Abstract: Image segmentation is an important task in many medical applications. Methods based on convolutional neural networks attain state-of-the-art accuracy; however, they typically rely on supervised training with large labeled datasets. Labeling medical images requires significant expertise and time, and typical hand-tuned approaches for data augmentation fail to capture the complex variations in such… ▽ More Image segmentation is an important task in many medical applications. Methods based on convolutional neural networks attain state-of-the-art accuracy; however, they typically rely on supervised training with large labeled datasets. Labeling medical images requires significant expertise and time, and typical hand-tuned approaches for data augmentation fail to capture the complex variations in such images. We present an automated data augmentation method for synthesizing labeled medical images. We demonstrate our method on the task of segmenting magnetic resonance imaging (MRI) brain scans. Our method requires only a single segmented scan, and leverages other unlabeled scans in a semi-supervised approach. We learn a model of transformations from the images, and use the model along with the labeled example to synthesize additional labeled examples. Each transformation is comprised of a spatial deformation field and an intensity change, enabling the synthesis of complex effects such as variations in anatomy and image acquisition procedures. We show that training a supervised segmenter with these new examples provides significant improvements over state-of-the-art methods for one-shot biomedical image segmentation. Our code is available at https://github.com/xamyzhao/brainstorm. △ Less

Submitted 6 April, 2019; v1 submitted 25 February, 2019; originally announced February 2019.

Comments: 9 pages, CVPR 2019

arXiv:1901.10002 [pdf, other]

doi 10.1145/3465416.3483305

A Framework for Understanding Sources of Harm throughout the Machine Learning Life Cycle

Authors: Harini Suresh, John V. Guttag

Abstract: As machine learning (ML) increasingly affects people and society, awareness of its potential unwanted consequences has also grown. To anticipate, prevent, and mitigate undesirable downstream consequences, it is critical that we understand when and how harm might be introduced throughout the ML life cycle. In this paper, we provide a framework that identifies seven distinct potential sources of dow… ▽ More As machine learning (ML) increasingly affects people and society, awareness of its potential unwanted consequences has also grown. To anticipate, prevent, and mitigate undesirable downstream consequences, it is critical that we understand when and how harm might be introduced throughout the ML life cycle. In this paper, we provide a framework that identifies seven distinct potential sources of downstream harm in machine learning, spanning data collection, development, and deployment. In doing so, we aim to facilitate more productive and precise communication around these issues, as well as more direct, application-grounded ways to mitigate them. △ Less

Submitted 1 December, 2021; v1 submitted 28 January, 2019; originally announced January 2019.

Journal ref: EAAMO 2021: Equity and Access in Algorithms, Mechanisms, and Optimization

arXiv:1812.06932 [pdf, other]

doi 10.1145/3368555.3384462

Fast Learning-based Registration of Sparse 3D Clinical Images

Authors: Kathleen M. Lewis, Natalia S. Rost, John Guttag, Adrian V. Dalca

Abstract: We introduce SparseVM, a method that registers clinical-quality 3D MR scans both faster and more accurately than previously possible. Deformable alignment, or registration, of clinical scans is a fundamental task for many clinical neuroscience studies. However, most registration algorithms are designed for high-resolution research-quality scans. In contrast to research-quality scans, clinical scan… ▽ More We introduce SparseVM, a method that registers clinical-quality 3D MR scans both faster and more accurately than previously possible. Deformable alignment, or registration, of clinical scans is a fundamental task for many clinical neuroscience studies. However, most registration algorithms are designed for high-resolution research-quality scans. In contrast to research-quality scans, clinical scans are often sparse, missing up to 86% of the slices available in research-quality scans. Existing methods for registering these sparse images are either inaccurate or extremely slow. We present a learning-based registration method, SparseVM, that is more accurate and orders of magnitude faster than the most accurate clinical registration methods. To our knowledge, it is the first method to use deep learning specifically tailored to registering clinical images. We demonstrate our method on a clinically-acquired MRI dataset of stroke patients and on a simulated sparse MRI dataset. Our code is available as part of the VoxelMorph package at http://voxelmorph.mit.edu/. △ Less

Submitted 6 April, 2020; v1 submitted 17 December, 2018; originally announced December 2018.

Comments: This version was accepted to CHIL. It builds on the previous version of the paper and includes more experimental results

arXiv:1812.00475 [pdf, other]

Multiple Instance Learning for ECG Risk Stratification

Authors: Divya Shanmugam, Davis Blalock, John Guttag

Abstract: Patients who suffer an acute coronary syndrome are at elevated risk for adverse cardiovascular events such as myocardial infarction and cardiovascular death. Accurate assessment of this risk is crucial to their course of care. We focus on estimating a patient's risk of cardiovascular death after an acute coronary syndrome based on a patient's raw electrocardiogram (ECG) signal. Learning from this… ▽ More Patients who suffer an acute coronary syndrome are at elevated risk for adverse cardiovascular events such as myocardial infarction and cardiovascular death. Accurate assessment of this risk is crucial to their course of care. We focus on estimating a patient's risk of cardiovascular death after an acute coronary syndrome based on a patient's raw electrocardiogram (ECG) signal. Learning from this signal is challenging for two reasons: 1) positive examples signifying a downstream cardiovascular event are scarce, causing drastic class imbalance, and 2) each patient's ECG signal consists of thousands of heartbeats, accompanied by a single label for the downstream outcome. Machine learning has been previously applied to this task, but most approaches rely on hand-crafted features and domain knowledge. We propose a method that learns a representation from the raw ECG signal by using a multiple instance learning framework. We present a learned risk score for cardiovascular death that outperforms existing risk metrics in predicting cardiovascular death within 30, 60, 90, and 365 days on a dataset of 5000 patients. △ Less

Submitted 25 March, 2020; v1 submitted 2 December, 2018; originally announced December 2018.

Comments: Machine Learning for Healthcare Conference (MLHC 2019)

arXiv:1809.05231 [pdf, other]

doi 10.1109/TMI.2019.2897538

VoxelMorph: A Learning Framework for Deformable Medical Image Registration

Authors: Guha Balakrishnan, Amy Zhao, Mert R. Sabuncu, John Guttag, Adrian V. Dalca

Abstract: We present VoxelMorph, a fast learning-based framework for deformable, pairwise medical image registration. Traditional registration methods optimize an objective function for each pair of images, which can be time-consuming for large datasets or rich deformation models. In contrast to this approach, and building on recent learning-based methods, we formulate registration as a function that maps a… ▽ More We present VoxelMorph, a fast learning-based framework for deformable, pairwise medical image registration. Traditional registration methods optimize an objective function for each pair of images, which can be time-consuming for large datasets or rich deformation models. In contrast to this approach, and building on recent learning-based methods, we formulate registration as a function that maps an input image pair to a deformation field that aligns these images. We parameterize the function via a convolutional neural network (CNN), and optimize the parameters of the neural network on a set of images. Given a new pair of scans, VoxelMorph rapidly computes a deformation field by directly evaluating the function. In this work, we explore two different training strategies. In the first (unsupervised) setting, we train the model to maximize standard image matching objective functions that are based on the image intensities. In the second setting, we leverage auxiliary segmentations available in the training data. We demonstrate that the unsupervised model's accuracy is comparable to state-of-the-art methods, while operating orders of magnitude faster. We also show that VoxelMorph trained with auxiliary data improves registration accuracy at test time, and evaluate the effect of training set size on registration. Our method promises to speed up medical image analysis and processing pipelines, while facilitating novel directions in learning-based registration and its applications. Our code is freely available at voxelmorph.csail.mit.edu. △ Less

Submitted 1 September, 2019; v1 submitted 13 September, 2018; originally announced September 2018.

Comments: Accepted to IEEE TMI ( (c) IEEE). This manuscript expands the CVPR 2018 paper (arXiv:1802.02604) by introducing an auxiliary model that uses segmentation maps during training, an amortized optimization analysis, and extensive model analysis. Code available at http://voxelmorph.csail.mit.edu

arXiv:1808.02515 [pdf, other]

doi 10.1145/3264903

Sprintz: Time Series Compression for the Internet of Things

Authors: Davis Blalock, Samuel Madden, John Guttag

Abstract: Thanks to the rapid proliferation of connected devices, sensor-generated time series constitute a large and growing portion of the world's data. Often, this data is collected from distributed, resource-constrained devices and centralized at one or more servers. A key challenge in this setup is reducing the size of the transmitted data without sacrificing its quality. Lower quality reduces the data… ▽ More Thanks to the rapid proliferation of connected devices, sensor-generated time series constitute a large and growing portion of the world's data. Often, this data is collected from distributed, resource-constrained devices and centralized at one or more servers. A key challenge in this setup is reducing the size of the transmitted data without sacrificing its quality. Lower quality reduces the data's utility, but smaller size enables both reduced network and storage costs at the servers and reduced power consumption in sensing devices. A natural solution is to compress the data at the sensing devices. Unfortunately, existing compression algorithms either violate the memory and latency constraints common for these devices or, as we show experimentally, perform poorly on sensor-generated time series. We introduce a time series compression algorithm that achieves state-of-the-art compression ratios while requiring less than 1KB of memory and adding virtually no latency. This method is suitable not only for low-power devices collecting data, but also for servers storing and querying data; in the latter context, it can decompress at over 3GB/s in a single thread, even faster than many algorithms with much lower compression ratios. A key component of our method is a high-speed forecasting algorithm that can be trained online and significantly outperforms alternatives such as delta coding. Extensive experiments on datasets from many domains show that these results hold not only for sensor data but also across a wide array of other time series. △ Less

Submitted 7 August, 2018; originally announced August 2018.

arXiv:1806.02878 [pdf, other]

doi 10.1145/3219819.3219930

Learning Tasks for Multitask Learning: Heterogenous Patient Populations in the ICU

Authors: Harini Suresh, Jen J. Gong, John Guttag

Abstract: Machine learning approaches have been effective in predicting adverse outcomes in different clinical settings. These models are often developed and evaluated on datasets with heterogeneous patient populations. However, good predictive performance on the aggregate population does not imply good performance for specific groups. In this work, we present a two-step framework to 1) learn relevant pat… ▽ More Machine learning approaches have been effective in predicting adverse outcomes in different clinical settings. These models are often developed and evaluated on datasets with heterogeneous patient populations. However, good predictive performance on the aggregate population does not imply good performance for specific groups. In this work, we present a two-step framework to 1) learn relevant patient subgroups, and 2) predict an outcome for separate patient populations in a multi-task framework, where each population is a separate task. We demonstrate how to discover relevant groups in an unsupervised way with a sequence-to-sequence autoencoder. We show that using these groups in a multi-task framework leads to better predictive performance of in-hospital mortality both across groups and overall. We also highlight the need for more granular evaluation of performance when dealing with heterogeneous populations. △ Less

Submitted 7 June, 2018; originally announced June 2018.

Comments: KDD 2018

arXiv:1806.00397 [pdf, other]

Visualizing Patient Timelines in the Intensive Care Unit

Authors: Dina Levy-Lambert, Jen J. Gong, Tristan Naumann, Tom J. Pollard, John V. Guttag

Abstract: Electronic Health Records (EHRs) contain a large volume of heterogeneous patient data, which are useful at the point of care and for retrospective research. These data are typically stored in relational databases. Gaining an integrated view of these data for a single patient typically requires complex SQL queries joining multiple tables. In this work, we present a visualization tool that integrate… ▽ More Electronic Health Records (EHRs) contain a large volume of heterogeneous patient data, which are useful at the point of care and for retrospective research. These data are typically stored in relational databases. Gaining an integrated view of these data for a single patient typically requires complex SQL queries joining multiple tables. In this work, we present a visualization tool that integrates heterogeneous health care data (e.g., clinical notes, laboratory test values, vital signs) into a single timeline. We train risk models offline and dynamically generate and present their predictions alongside patient data. Our visualization is designed to enable users to understand the heterogeneous temporal data quickly and comprehensively, and to place the output of analytic models in the context of the underlying data. △ Less

Submitted 1 June, 2018; originally announced June 2018.

arXiv:1805.04605 [pdf, other]

doi 10.1007/978-3-030-00928-1_82

Unsupervised Learning for Fast Probabilistic Diffeomorphic Registration

Authors: Adrian V. Dalca, Guha Balakrishnan, John Guttag, Mert R. Sabuncu

Abstract: Traditional deformable registration techniques achieve impressive results and offer a rigorous theoretical treatment, but are computationally intensive since they solve an optimization problem for each image pair. Recently, learning-based methods have facilitated fast registration by learning spatial deformation functions. However, these approaches use restricted deformation models, require superv… ▽ More Traditional deformable registration techniques achieve impressive results and offer a rigorous theoretical treatment, but are computationally intensive since they solve an optimization problem for each image pair. Recently, learning-based methods have facilitated fast registration by learning spatial deformation functions. However, these approaches use restricted deformation models, require supervised labels, or do not guarantee a diffeomorphic (topology-preserving) registration. Furthermore, learning-based registration tools have not been derived from a probabilistic framework that can offer uncertainty estimates. In this paper, we present a probabilistic generative model and derive an unsupervised learning-based inference algorithm that makes use of recent developments in convolutional neural networks (CNNs). We demonstrate our method on a 3D brain registration task, and provide an empirical analysis of the algorithm. Our approach results in state of the art accuracy and very fast runtimes, while providing diffeomorphic guarantees and uncertainty estimates. Our implementation is available online at http://voxelmorph.csail.mit.edu . △ Less

Submitted 14 September, 2018; v1 submitted 11 May, 2018; originally announced May 2018.

Comments: MICCAI 2018 (Oral Presentation). Proceedings: LNCS 11070, pp 729-738

Journal ref: LNCS 11070, pp 729-738, Springer. 2018

arXiv:1804.07739 [pdf, other]

Synthesizing Images of Humans in Unseen Poses

Authors: Guha Balakrishnan, Amy Zhao, Adrian V. Dalca, Fredo Durand, John Guttag

Abstract: We address the computational problem of novel human pose synthesis. Given an image of a person and a desired pose, we produce a depiction of that person in that pose, retaining the appearance of both the person and background. We present a modular generative neural network that synthesizes unseen poses using training pairs of images and poses taken from human action videos. Our network separates a… ▽ More We address the computational problem of novel human pose synthesis. Given an image of a person and a desired pose, we produce a depiction of that person in that pose, retaining the appearance of both the person and background. We present a modular generative neural network that synthesizes unseen poses using training pairs of images and poses taken from human action videos. Our network separates a scene into different body part and background layers, moves body parts to new locations and refines their appearances, and composites the new foreground with a hole-filled background. These subtasks, implemented with separate modules, are trained jointly using only a single target image as a supervised label. We use an adversarial discriminator to force our network to synthesize realistic details conditioned on pose. We demonstrate image synthesis results on three action classes: golf, yoga/workouts and tennis, and show that our method produces accurate results within action classes as well as across action classes. Given a sequence of desired poses, we also produce coherent videos of actions. △ Less

Submitted 20 April, 2018; originally announced April 2018.

Comments: CVPR 2018

arXiv:1802.02604 [pdf, other]

doi 10.1109/CVPR.2018.00964

An Unsupervised Learning Model for Deformable Medical Image Registration

Authors: Guha Balakrishnan, Amy Zhao, Mert R. Sabuncu, John Guttag, Adrian V. Dalca

Abstract: We present a fast learning-based algorithm for deformable, pairwise 3D medical image registration. Current registration methods optimize an objective function independently for each pair of images, which can be time-consuming for large data. We define registration as a parametric function, and optimize its parameters given a set of images from a collection of interest. Given a new pair of scans, w… ▽ More We present a fast learning-based algorithm for deformable, pairwise 3D medical image registration. Current registration methods optimize an objective function independently for each pair of images, which can be time-consuming for large data. We define registration as a parametric function, and optimize its parameters given a set of images from a collection of interest. Given a new pair of scans, we can quickly compute a registration field by directly evaluating the function using the learned parameters. We model this function using a convolutional neural network (CNN), and use a spatial transform layer to reconstruct one image from another while imposing smoothness constraints on the registration field. The proposed method does not require supervised information such as ground truth registration fields or anatomical landmarks. We demonstrate registration accuracy comparable to state-of-the-art 3D image registration, while operating orders of magnitude faster in practice. Our method promises to significantly speed up medical image analysis and processing pipelines, while facilitating novel directions in learning-based registration and its applications. Our code is available at https://github.com/balakg/voxelmorph . △ Less

Submitted 20 April, 2018; v1 submitted 7 February, 2018; originally announced February 2018.

Comments: 9 pages, in CVPR 2018

arXiv:1712.00643 [pdf, other]

Learning the Probability of Activation in the Presence of Latent Spreaders

Authors: Maggie Makar, John Guttag, Jenna Wiens

Abstract: When an infection spreads in a community, an individual's probability of becoming infected depends on both her susceptibility and exposure to the contagion through contact with others. While one often has knowledge regarding an individual's susceptibility, in many cases, whether or not an individual's contacts are contagious is unknown. We study the problem of predicting if an individual will adop… ▽ More When an infection spreads in a community, an individual's probability of becoming infected depends on both her susceptibility and exposure to the contagion through contact with others. While one often has knowledge regarding an individual's susceptibility, in many cases, whether or not an individual's contacts are contagious is unknown. We study the problem of predicting if an individual will adopt a contagion in the presence of multiple modes of infection (exposure/susceptibility) and latent neighbor influence. We present a generative probabilistic model and a variational inference method to learn the parameters of our model. Through a series of experiments on synthetic data, we measure the ability of the proposed model to identify latent spreaders, and predict the risk of infection. Applied to a real dataset of 20,000 hospital patients, we demonstrate the utility of our model in predicting the onset of a healthcare associated infection using patient room-sharing and nurse-sharing networks. Our model outperforms existing benchmarks and provides actionable insights for the design and implementation of targeted interventions to curb the spread of infection. △ Less

Submitted 2 December, 2017; originally announced December 2017.

Comments: To appear in AAA1-18

arXiv:1706.10283 [pdf, other]

doi 10.1145/3097983.3098195

Bolt: Accelerated Data Mining with Fast Vector Compression

Authors: Davis W Blalock, John V Guttag

Abstract: Vectors of data are at the heart of machine learning and data mining. Recently, vector quantization methods have shown great promise in reducing both the time and space costs of operating on vectors. We introduce a vector quantization algorithm that can compress vectors over 12x faster than existing techniques while also accelerating approximate vector operations such as distance and dot product c… ▽ More Vectors of data are at the heart of machine learning and data mining. Recently, vector quantization methods have shown great promise in reducing both the time and space costs of operating on vectors. We introduce a vector quantization algorithm that can compress vectors over 12x faster than existing techniques while also accelerating approximate vector operations such as distance and dot product computations by up to 10x. Because it can encode over 2GB of vectors per second, it makes vector quantization cheap enough to employ in many more circumstances. For example, using our technique to compute approximate dot products in a nested loop can multiply matrices faster than a state-of-the-art BLAS implementation, even when our algorithm must first compress the matrices. In addition to showing the above speedups, we demonstrate that our approach can accelerate nearest neighbor search and maximum inner product search by over 100x compared to floating point operations and up to 10x compared to other vector quantization methods. Our approximate Euclidean distance and dot product computations are not only faster than those of related algorithms with slower encodings, but also faster than Hamming distance computations, which have direct hardware support on the tested platforms. We also assess the errors of our algorithm's approximate distances and dot products, and find that it is competitive with existing, slower vector quantization algorithms. △ Less

Submitted 30 June, 2017; originally announced June 2017.

Comments: Research track paper at KDD 2017

arXiv:1612.04007 [pdf, other]

A Video-Based Method for Objectively Rating Ataxia

Authors: Ronnachai Jaroensri, Amy Zhao, Guha Balakrishnan, Derek Lo, Jeremy Schmahmann, John Guttag, Fredo Durand

Abstract: For many movement disorders, such as Parkinson's disease and ataxia, disease progression is visually assessed by a clinician using a numerical disease rating scale. These tests are subjective, time-consuming, and must be administered by a professional. This can be problematic where specialists are not available, or when a patient is not consistently evaluated by the same clinician. We present an a… ▽ More For many movement disorders, such as Parkinson's disease and ataxia, disease progression is visually assessed by a clinician using a numerical disease rating scale. These tests are subjective, time-consuming, and must be administered by a professional. This can be problematic where specialists are not available, or when a patient is not consistently evaluated by the same clinician. We present an automated method for quantifying the severity of motion impairment in patients with ataxia, using only video recordings. We consider videos of the finger-to-nose test, a common movement task used as part of the assessment of ataxia progression during the course of routine clinical checkups. Our method uses neural network-based pose estimation and optical flow techniques to track the motion of the patient's hand in a video recording. We extract features that describe qualities of the motion such as speed and variation in performance. Using labels provided by an expert clinician, we train a supervised learning model that predicts severity according to the Brief Ataxia Rating Scale (BARS). The performance of our system is comparable to that of a group of ataxia specialists in terms of mean error and correlation, and our system's predictions were consistently within the range of inter-rater variability. This work demonstrates the feasibility of using computer vision and machine learning to produce consistent and clinically useful measures of motor impairment. △ Less

Submitted 7 September, 2017; v1 submitted 12 December, 2016; originally announced December 2016.

Comments: MLHC 2017

arXiv:1609.09196 [pdf, other]

EXTRACT: Strong Examples from Weakly-Labeled Sensor Data

Authors: Davis W. Blalock, John V. Guttag

Abstract: Thanks to the rise of wearable and connected devices, sensor-generated time series comprise a large and growing fraction of the world's data. Unfortunately, extracting value from this data can be challenging, since sensors report low-level signals (e.g., acceleration), not the high-level events that are typically of interest (e.g., gestures). We introduce a technique to bridge this gap by automati… ▽ More Thanks to the rise of wearable and connected devices, sensor-generated time series comprise a large and growing fraction of the world's data. Unfortunately, extracting value from this data can be challenging, since sensors report low-level signals (e.g., acceleration), not the high-level events that are typically of interest (e.g., gestures). We introduce a technique to bridge this gap by automatically extracting examples of real-world events in low-level data, given only a rough estimate of when these events have taken place. By identifying sets of features that repeat in the same temporal arrangement, we isolate examples of such diverse events as human actions, power consumption patterns, and spoken words with up to 96% precision and recall. Our method is fast enough to run in real time and assumes only minimal knowledge of which variables are relevant or the lengths of events. Our evaluation uses numerous publicly available datasets and over 1 million samples of manually labeled sensor data. △ Less

Submitted 29 September, 2016; originally announced September 2016.

Comments: To appear in IEEE International Conference on Data Mining 2016

arXiv:1608.02301 [pdf, other]

Uncovering Voice Misuse Using Symbolic Mismatch

Authors: Marzyeh Ghassemi, Zeeshan Syed, Daryush D. Mehta, Jarrad H. Van Stan, Robert E. Hillman, John V. Guttag

Abstract: Voice disorders affect an estimated 14 million working-aged Americans, and many more worldwide. We present the first large scale study of vocal misuse based on long-term ambulatory data collected by an accelerometer placed on the neck. We investigate an unsupervised data mining approach to uncovering latent information about voice misuse. We segment signals from over 253 days of data from 22 sub… ▽ More Voice disorders affect an estimated 14 million working-aged Americans, and many more worldwide. We present the first large scale study of vocal misuse based on long-term ambulatory data collected by an accelerometer placed on the neck. We investigate an unsupervised data mining approach to uncovering latent information about voice misuse. We segment signals from over 253 days of data from 22 subjects into over a hundred million single glottal pulses (closures of the vocal folds), cluster segments into symbols, and use symbolic mismatch to uncover differences between patients and matched controls, and between patients pre- and post-treatment. Our results show significant behavioral differences between patients and controls, as well as between some pre- and post-treatment patients. Our proposed approach provides an objective basis for hel** diagnose behavioral voice disorders, and is a first step towards a more data-driven understanding of the impact of voice therapy. △ Less

Submitted 7 August, 2016; originally announced August 2016.

Comments: Presented at 2016 Machine Learning and Healthcare Conference (MLHC 2016), Los Angeles, CA

arXiv:1608.02071 [pdf, ps, other]

Transferring Knowledge from Text to Predict Disease Onset

Authors: Yun Liu, Kun-Ta Chuang, Fu-Wen Liang, Huey-Jen Su, Collin M. Stultz, John V. Guttag

Abstract: In many domains such as medicine, training data is in short supply. In such cases, external knowledge is often helpful in building predictive models. We propose a novel method to incorporate publicly available domain expertise to build accurate models. Specifically, we use word2vec models trained on a domain-specific corpus to estimate the relevance of each feature's text description to the predic… ▽ More In many domains such as medicine, training data is in short supply. In such cases, external knowledge is often helpful in building predictive models. We propose a novel method to incorporate publicly available domain expertise to build accurate models. Specifically, we use word2vec models trained on a domain-specific corpus to estimate the relevance of each feature's text description to the prediction problem. We use these relevance estimates to rescale the features, causing more important features to experience weaker regularization. We apply our method to predict the onset of five chronic diseases in the next five years in two genders and two age groups. Our rescaling approach improves the accuracy of the model, particularly when there are few positive examples. Furthermore, our method selects 60% fewer features, easing interpretation by physicians. Our method is applicable to other domains where feature and outcome descriptions are available. △ Less

Submitted 6 August, 2016; originally announced August 2016.

Comments: Presented at 2016 Machine Learning and Healthcare Conference (MLHC 2016), Los Angeles, CA

Report number: Proceedings of Machine Learning Research Volume 56

Showing 1–48 of 48 results for author: Guttag, J