-
Saliency Cards: A Framework to Characterize and Compare Saliency Methods
Authors:
Angie Boggust,
Harini Suresh,
Hendrik Strobelt,
John V. Guttag,
Arvind Satyanarayan
Abstract:
Saliency methods are a common class of machine learning interpretability techniques that calculate how important each input feature is to a model's output. We find that, with the rapid pace of development, users struggle to stay informed of the strengths and limitations of new methods and, thus, choose methods for unprincipled reasons (e.g., popularity). Moreover, despite a corresponding rise in e…
▽ More
Saliency methods are a common class of machine learning interpretability techniques that calculate how important each input feature is to a model's output. We find that, with the rapid pace of development, users struggle to stay informed of the strengths and limitations of new methods and, thus, choose methods for unprincipled reasons (e.g., popularity). Moreover, despite a corresponding rise in evaluation metrics, existing approaches assume universal desiderata for saliency methods (e.g., faithfulness) that do not account for diverse user needs. In response, we introduce saliency cards: structured documentation of how saliency methods operate and their performance across a battery of evaluative metrics. Through a review of 25 saliency method papers and 33 method evaluations, we identify 10 attributes that users should account for when choosing a method. We group these attributes into three categories that span the process of computing and interpreting saliency: methodology, or how the saliency is calculated; sensitivity, or the relationship between the saliency and the underlying model and data; and, perceptibility, or how an end user ultimately interprets the result. By collating this information, saliency cards allow users to more holistically assess and compare the implications of different methods. Through nine semi-structured interviews with users from various backgrounds, including researchers, radiologists, and computational biologists, we find that saliency cards provide a detailed vocabulary for discussing individual methods and allow for a more systematic selection of task-appropriate methods. Moreover, with saliency cards, we are able to analyze the research landscape in a more structured fashion to identify opportunities for new methods and evaluation metrics for unmet user needs.
△ Less
Submitted 30 May, 2023; v1 submitted 6 June, 2022;
originally announced June 2022.
-
Intuitively Assessing ML Model Reliability through Example-Based Explanations and Editing Model Inputs
Authors:
Harini Suresh,
Kathleen M. Lewis,
John V. Guttag,
Arvind Satyanarayan
Abstract:
Interpretability methods aim to help users build trust in and understand the capabilities of machine learning models. However, existing approaches often rely on abstract, complex visualizations that poorly map to the task at hand or require non-trivial ML expertise to interpret. Here, we present two visual analytics modules that facilitate an intuitive assessment of model reliability. To help user…
▽ More
Interpretability methods aim to help users build trust in and understand the capabilities of machine learning models. However, existing approaches often rely on abstract, complex visualizations that poorly map to the task at hand or require non-trivial ML expertise to interpret. Here, we present two visual analytics modules that facilitate an intuitive assessment of model reliability. To help users better characterize and reason about a model's uncertainty, we visualize raw and aggregate information about a given input's nearest neighbors. Using an interactive editor, users can manipulate this input in semantically-meaningful ways, determine the effect on the output, and compare against their prior expectations. We evaluate our interface using an electrocardiogram beat classification case study. Compared to a baseline feature importance interface, we find that 14 physicians are better able to align the model's uncertainty with domain-relevant factors and build intuition about its capabilities and limitations.
△ Less
Submitted 9 July, 2021; v1 submitted 16 February, 2021;
originally announced February 2021.
-
Painting Many Pasts: Synthesizing Time Lapse Videos of Paintings
Authors:
Amy Zhao,
Guha Balakrishnan,
Kathleen M. Lewis,
Frédo Durand,
John V. Guttag,
Adrian V. Dalca
Abstract:
We introduce a new video synthesis task: synthesizing time lapse videos depicting how a given painting might have been created. Artists paint using unique combinations of brushes, strokes, and colors. There are often many possible ways to create a given painting. Our goal is to learn to capture this rich range of possibilities.
Creating distributions of long-term videos is a challenge for learni…
▽ More
We introduce a new video synthesis task: synthesizing time lapse videos depicting how a given painting might have been created. Artists paint using unique combinations of brushes, strokes, and colors. There are often many possible ways to create a given painting. Our goal is to learn to capture this rich range of possibilities.
Creating distributions of long-term videos is a challenge for learning-based video synthesis methods. We present a probabilistic model that, given a single image of a completed painting, recurrently synthesizes steps of the painting process. We implement this model as a convolutional neural network, and introduce a novel training scheme to enable learning from a limited dataset of painting time lapses. We demonstrate that this model can be used to sample many time steps, enabling long-term stochastic video synthesis. We evaluate our method on digital and watercolor paintings collected from video websites, and show that human raters find our synthetic videos to be similar to time lapse videos produced by real artists. Our code is available at https://xamyzhao.github.io/timecraft.
△ Less
Submitted 25 April, 2020; v1 submitted 3 January, 2020;
originally announced January 2020.
-
Visual Deprojection: Probabilistic Recovery of Collapsed Dimensions
Authors:
Guha Balakrishnan,
Adrian V. Dalca,
Amy Zhao,
John V. Guttag,
Fredo Durand,
William T. Freeman
Abstract:
We introduce visual deprojection: the task of recovering an image or video that has been collapsed along a dimension. Projections arise in various contexts, such as long-exposure photography, where a dynamic scene is collapsed in time to produce a motion-blurred image, and corner cameras, where reflected light from a scene is collapsed along a spatial dimension because of an edge occluder to yield…
▽ More
We introduce visual deprojection: the task of recovering an image or video that has been collapsed along a dimension. Projections arise in various contexts, such as long-exposure photography, where a dynamic scene is collapsed in time to produce a motion-blurred image, and corner cameras, where reflected light from a scene is collapsed along a spatial dimension because of an edge occluder to yield a 1D video. Deprojection is ill-posed-- often there are many plausible solutions for a given input. We first propose a probabilistic model capturing the ambiguity of the task. We then present a variational inference strategy using convolutional neural networks as functional approximators. Sampling from the inference network at test time yields plausible candidates from the distribution of original signals that are consistent with a given input projection. We evaluate the method on several datasets for both spatial and temporal deprojection tasks. We first demonstrate the method can recover human gait videos and face images from spatial projections, and then show that it can recover videos of moving digits from dramatically motion-blurred images obtained via temporal projection.
△ Less
Submitted 1 September, 2019;
originally announced September 2019.
-
Data augmentation using learned transformations for one-shot medical image segmentation
Authors:
Amy Zhao,
Guha Balakrishnan,
Frédo Durand,
John V. Guttag,
Adrian V. Dalca
Abstract:
Image segmentation is an important task in many medical applications. Methods based on convolutional neural networks attain state-of-the-art accuracy; however, they typically rely on supervised training with large labeled datasets. Labeling medical images requires significant expertise and time, and typical hand-tuned approaches for data augmentation fail to capture the complex variations in such…
▽ More
Image segmentation is an important task in many medical applications. Methods based on convolutional neural networks attain state-of-the-art accuracy; however, they typically rely on supervised training with large labeled datasets. Labeling medical images requires significant expertise and time, and typical hand-tuned approaches for data augmentation fail to capture the complex variations in such images.
We present an automated data augmentation method for synthesizing labeled medical images. We demonstrate our method on the task of segmenting magnetic resonance imaging (MRI) brain scans. Our method requires only a single segmented scan, and leverages other unlabeled scans in a semi-supervised approach. We learn a model of transformations from the images, and use the model along with the labeled example to synthesize additional labeled examples. Each transformation is comprised of a spatial deformation field and an intensity change, enabling the synthesis of complex effects such as variations in anatomy and image acquisition procedures. We show that training a supervised segmenter with these new examples provides significant improvements over state-of-the-art methods for one-shot biomedical image segmentation. Our code is available at https://github.com/xamyzhao/brainstorm.
△ Less
Submitted 6 April, 2019; v1 submitted 25 February, 2019;
originally announced February 2019.
-
A Framework for Understanding Sources of Harm throughout the Machine Learning Life Cycle
Authors:
Harini Suresh,
John V. Guttag
Abstract:
As machine learning (ML) increasingly affects people and society, awareness of its potential unwanted consequences has also grown. To anticipate, prevent, and mitigate undesirable downstream consequences, it is critical that we understand when and how harm might be introduced throughout the ML life cycle. In this paper, we provide a framework that identifies seven distinct potential sources of dow…
▽ More
As machine learning (ML) increasingly affects people and society, awareness of its potential unwanted consequences has also grown. To anticipate, prevent, and mitigate undesirable downstream consequences, it is critical that we understand when and how harm might be introduced throughout the ML life cycle. In this paper, we provide a framework that identifies seven distinct potential sources of downstream harm in machine learning, spanning data collection, development, and deployment. In doing so, we aim to facilitate more productive and precise communication around these issues, as well as more direct, application-grounded ways to mitigate them.
△ Less
Submitted 1 December, 2021; v1 submitted 28 January, 2019;
originally announced January 2019.
-
Visualizing Patient Timelines in the Intensive Care Unit
Authors:
Dina Levy-Lambert,
Jen J. Gong,
Tristan Naumann,
Tom J. Pollard,
John V. Guttag
Abstract:
Electronic Health Records (EHRs) contain a large volume of heterogeneous patient data, which are useful at the point of care and for retrospective research. These data are typically stored in relational databases. Gaining an integrated view of these data for a single patient typically requires complex SQL queries joining multiple tables. In this work, we present a visualization tool that integrate…
▽ More
Electronic Health Records (EHRs) contain a large volume of heterogeneous patient data, which are useful at the point of care and for retrospective research. These data are typically stored in relational databases. Gaining an integrated view of these data for a single patient typically requires complex SQL queries joining multiple tables. In this work, we present a visualization tool that integrates heterogeneous health care data (e.g., clinical notes, laboratory test values, vital signs) into a single timeline. We train risk models offline and dynamically generate and present their predictions alongside patient data. Our visualization is designed to enable users to understand the heterogeneous temporal data quickly and comprehensively, and to place the output of analytic models in the context of the underlying data.
△ Less
Submitted 1 June, 2018;
originally announced June 2018.
-
Bolt: Accelerated Data Mining with Fast Vector Compression
Authors:
Davis W Blalock,
John V Guttag
Abstract:
Vectors of data are at the heart of machine learning and data mining. Recently, vector quantization methods have shown great promise in reducing both the time and space costs of operating on vectors. We introduce a vector quantization algorithm that can compress vectors over 12x faster than existing techniques while also accelerating approximate vector operations such as distance and dot product c…
▽ More
Vectors of data are at the heart of machine learning and data mining. Recently, vector quantization methods have shown great promise in reducing both the time and space costs of operating on vectors. We introduce a vector quantization algorithm that can compress vectors over 12x faster than existing techniques while also accelerating approximate vector operations such as distance and dot product computations by up to 10x. Because it can encode over 2GB of vectors per second, it makes vector quantization cheap enough to employ in many more circumstances. For example, using our technique to compute approximate dot products in a nested loop can multiply matrices faster than a state-of-the-art BLAS implementation, even when our algorithm must first compress the matrices.
In addition to showing the above speedups, we demonstrate that our approach can accelerate nearest neighbor search and maximum inner product search by over 100x compared to floating point operations and up to 10x compared to other vector quantization methods. Our approximate Euclidean distance and dot product computations are not only faster than those of related algorithms with slower encodings, but also faster than Hamming distance computations, which have direct hardware support on the tested platforms. We also assess the errors of our algorithm's approximate distances and dot products, and find that it is competitive with existing, slower vector quantization algorithms.
△ Less
Submitted 30 June, 2017;
originally announced June 2017.
-
EXTRACT: Strong Examples from Weakly-Labeled Sensor Data
Authors:
Davis W. Blalock,
John V. Guttag
Abstract:
Thanks to the rise of wearable and connected devices, sensor-generated time series comprise a large and growing fraction of the world's data. Unfortunately, extracting value from this data can be challenging, since sensors report low-level signals (e.g., acceleration), not the high-level events that are typically of interest (e.g., gestures). We introduce a technique to bridge this gap by automati…
▽ More
Thanks to the rise of wearable and connected devices, sensor-generated time series comprise a large and growing fraction of the world's data. Unfortunately, extracting value from this data can be challenging, since sensors report low-level signals (e.g., acceleration), not the high-level events that are typically of interest (e.g., gestures). We introduce a technique to bridge this gap by automatically extracting examples of real-world events in low-level data, given only a rough estimate of when these events have taken place.
By identifying sets of features that repeat in the same temporal arrangement, we isolate examples of such diverse events as human actions, power consumption patterns, and spoken words with up to 96% precision and recall. Our method is fast enough to run in real time and assumes only minimal knowledge of which variables are relevant or the lengths of events. Our evaluation uses numerous publicly available datasets and over 1 million samples of manually labeled sensor data.
△ Less
Submitted 29 September, 2016;
originally announced September 2016.
-
Uncovering Voice Misuse Using Symbolic Mismatch
Authors:
Marzyeh Ghassemi,
Zeeshan Syed,
Daryush D. Mehta,
Jarrad H. Van Stan,
Robert E. Hillman,
John V. Guttag
Abstract:
Voice disorders affect an estimated 14 million working-aged Americans, and many more worldwide. We present the first large scale study of vocal misuse based on long-term ambulatory data collected by an accelerometer placed on the neck. We investigate an unsupervised data mining approach to uncovering latent information about voice misuse.
We segment signals from over 253 days of data from 22 sub…
▽ More
Voice disorders affect an estimated 14 million working-aged Americans, and many more worldwide. We present the first large scale study of vocal misuse based on long-term ambulatory data collected by an accelerometer placed on the neck. We investigate an unsupervised data mining approach to uncovering latent information about voice misuse.
We segment signals from over 253 days of data from 22 subjects into over a hundred million single glottal pulses (closures of the vocal folds), cluster segments into symbols, and use symbolic mismatch to uncover differences between patients and matched controls, and between patients pre- and post-treatment. Our results show significant behavioral differences between patients and controls, as well as between some pre- and post-treatment patients. Our proposed approach provides an objective basis for hel** diagnose behavioral voice disorders, and is a first step towards a more data-driven understanding of the impact of voice therapy.
△ Less
Submitted 7 August, 2016;
originally announced August 2016.
-
Transferring Knowledge from Text to Predict Disease Onset
Authors:
Yun Liu,
Kun-Ta Chuang,
Fu-Wen Liang,
Huey-Jen Su,
Collin M. Stultz,
John V. Guttag
Abstract:
In many domains such as medicine, training data is in short supply. In such cases, external knowledge is often helpful in building predictive models. We propose a novel method to incorporate publicly available domain expertise to build accurate models. Specifically, we use word2vec models trained on a domain-specific corpus to estimate the relevance of each feature's text description to the predic…
▽ More
In many domains such as medicine, training data is in short supply. In such cases, external knowledge is often helpful in building predictive models. We propose a novel method to incorporate publicly available domain expertise to build accurate models. Specifically, we use word2vec models trained on a domain-specific corpus to estimate the relevance of each feature's text description to the prediction problem. We use these relevance estimates to rescale the features, causing more important features to experience weaker regularization.
We apply our method to predict the onset of five chronic diseases in the next five years in two genders and two age groups. Our rescaling approach improves the accuracy of the model, particularly when there are few positive examples. Furthermore, our method selects 60% fewer features, easing interpretation by physicians. Our method is applicable to other domains where feature and outcome descriptions are available.
△ Less
Submitted 6 August, 2016;
originally announced August 2016.