Search | arXiv e-print repository

How Consistent are Clinicians? Evaluating the Predictability of Sepsis Disease Progression with Dynamics Models

Authors: Unnseo Park, Venkatesh Sivaraman, Adam Perer

Abstract: Reinforcement learning (RL) is a promising approach to generate treatment policies for sepsis patients in intensive care. While retrospective evaluation metrics show decreased mortality when these policies are followed, studies with clinicians suggest their recommendations are often spurious. We propose that these shortcomings may be due to lack of diversity in observed actions and outcomes in the… ▽ More Reinforcement learning (RL) is a promising approach to generate treatment policies for sepsis patients in intensive care. While retrospective evaluation metrics show decreased mortality when these policies are followed, studies with clinicians suggest their recommendations are often spurious. We propose that these shortcomings may be due to lack of diversity in observed actions and outcomes in the training data, and we construct experiments to investigate the feasibility of predicting sepsis disease severity changes due to clinician actions. Preliminary results suggest incorporating action information does not significantly improve model performance, indicating that clinician actions may not be sufficiently variable to yield measurable effects on disease progression. We discuss the implications of these findings for optimizing sepsis treatment. △ Less

Submitted 10 April, 2024; originally announced April 2024.

Comments: 6 pages, 3 figures; accepted workshop paper at Time Series for Health @ ICLR 2024

arXiv:2402.13437 [pdf, other]

doi 10.1145/3613904.3641896

Sketching AI Concepts with Capabilities and Examples: AI Innovation in the Intensive Care Unit

Authors: Nur Yildirim, Susanna Zlotnikov, Deniz Sayar, Jeremy M. Kahn, Leigh A. Bukowski, Sher Shah Amin, Kathryn A. Riman, Billie S. Davis, John S. Minturn, Andrew J. King, Dan Ricketts, Lu Tang, Venkatesh Sivaraman, Adam Perer, Sarah M. Preum, James McCann, John Zimmerman

Abstract: Advances in artificial intelligence (AI) have enabled unprecedented capabilities, yet innovation teams struggle when envisioning AI concepts. Data science teams think of innovations users do not want, while domain experts think of innovations that cannot be built. A lack of effective ideation seems to be a breakdown point. How might multidisciplinary teams identify buildable and desirable use case… ▽ More Advances in artificial intelligence (AI) have enabled unprecedented capabilities, yet innovation teams struggle when envisioning AI concepts. Data science teams think of innovations users do not want, while domain experts think of innovations that cannot be built. A lack of effective ideation seems to be a breakdown point. How might multidisciplinary teams identify buildable and desirable use cases? This paper presents a first hand account of ideating AI concepts to improve critical care medicine. As a team of data scientists, clinicians, and HCI researchers, we conducted a series of design workshops to explore more effective approaches to AI concept ideation and problem formulation. We detail our process, the challenges we encountered, and practices and artifacts that proved effective. We discuss the research implications for improved collaboration and stakeholder engagement, and discuss the role HCI might play in reducing the high failure rate experienced in AI innovation. △ Less

Submitted 20 February, 2024; originally announced February 2024.

Comments: to appear at CHI 2024

arXiv:2308.03964 [pdf, other]

Dead or Alive: Continuous Data Profiling for Interactive Data Science

Authors: Will Epperson, Vaishnavi Gorantla, Dominik Moritz, Adam Perer

Abstract: Profiling data by plotting distributions and analyzing summary statistics is a critical step throughout data analysis. Currently, this process is manual and tedious since analysts must write extra code to examine their data after every transformation. This inefficiency may lead to data scientists profiling their data infrequently, rather than after each transformation, making it easy for them to m… ▽ More Profiling data by plotting distributions and analyzing summary statistics is a critical step throughout data analysis. Currently, this process is manual and tedious since analysts must write extra code to examine their data after every transformation. This inefficiency may lead to data scientists profiling their data infrequently, rather than after each transformation, making it easy for them to miss important errors or insights. We propose continuous data profiling as a process that allows analysts to immediately see interactive visual summaries of their data throughout their data analysis to facilitate fast and thorough analysis. Our system, AutoProfiler, presents three ways to support continuous data profiling: it automatically displays data distributions and summary statistics to facilitate data comprehension; it is live, so visualizations are always accessible and update automatically as the data updates; it supports follow up analysis and documentation by authoring code for the user in the notebook. In a user study with 16 participants, we evaluate two versions of our system that integrate different levels of automation: both automatically show data profiles and facilitate code authoring, however, one version updates reactively and the other updates only on demand. We find that both tools facilitate insight discovery with 91% of user-generated insights originating from the tools rather than manual profiling code written by users. Participants found live updates intuitive and felt it helped them verify their transformations while those with on-demand profiles liked the ability to look at past visualizations. We also present a longitudinal case study on how AutoProfiler helped domain scientists find serendipitous insights about their data through automatic, live data profiles. Our results have implications for the design of future tools that offer automated data analysis support. △ Less

Submitted 7 August, 2023; originally announced August 2023.

Comments: To appear at IEEE VIS conference 2023

arXiv:2307.13566 [pdf, other]

doi 10.1145/3641022

The Impact of Imperfect XAI on Human-AI Decision-Making

Authors: Katelyn Morrison, Philipp Spitzer, Violet Turri, Michelle Feng, Niklas Kühl, Adam Perer

Abstract: Explainability techniques are rapidly being developed to improve human-AI decision-making across various cooperative work settings. Consequently, previous research has evaluated how decision-makers collaborate with imperfect AI by investigating appropriate reliance and task performance with the aim of designing more human-centered computer-supported collaborative tools. Several human-centered expl… ▽ More Explainability techniques are rapidly being developed to improve human-AI decision-making across various cooperative work settings. Consequently, previous research has evaluated how decision-makers collaborate with imperfect AI by investigating appropriate reliance and task performance with the aim of designing more human-centered computer-supported collaborative tools. Several human-centered explainable AI (XAI) techniques have been proposed in hopes of improving decision-makers' collaboration with AI; however, these techniques are grounded in findings from previous studies that primarily focus on the impact of incorrect AI advice. Few studies acknowledge the possibility of the explanations being incorrect even if the AI advice is correct. Thus, it is crucial to understand how imperfect XAI affects human-AI decision-making. In this work, we contribute a robust, mixed-methods user study with 136 participants to evaluate how incorrect explanations influence humans' decision-making behavior in a bird species identification task, taking into account their level of expertise and an explanation's level of assertiveness. Our findings reveal the influence of imperfect XAI and humans' level of expertise on their reliance on AI and human-AI team performance. We also discuss how explanations can deceive decision-makers during human-AI collaboration. Hence, we shed light on the impacts of imperfect XAI in the field of computer-supported cooperative work and provide guidelines for designers of human-AI collaboration systems. △ Less

Submitted 8 May, 2024; v1 submitted 25 July, 2023; originally announced July 2023.

Comments: Accepted to ACM CSCW 2024. 27 pages, 9 figures, 1 table, additional figures/table in the appendix

arXiv:2302.04732 [pdf, other]

doi 10.1145/3544548.3581268

Zeno: An Interactive Framework for Behavioral Evaluation of Machine Learning

Authors: Ángel Alexander Cabrera, Erica Fu, Donald Bertucci, Kenneth Holstein, Ameet Talwalkar, Jason I. Hong, Adam Perer

Abstract: Machine learning models with high accuracy on test data can still produce systematic failures, such as harmful biases and safety issues, when deployed in the real world. To detect and mitigate such failures, practitioners run behavioral evaluation of their models, checking model outputs for specific types of inputs. Behavioral evaluation is important but challenging, requiring that practitioners d… ▽ More Machine learning models with high accuracy on test data can still produce systematic failures, such as harmful biases and safety issues, when deployed in the real world. To detect and mitigate such failures, practitioners run behavioral evaluation of their models, checking model outputs for specific types of inputs. Behavioral evaluation is important but challenging, requiring that practitioners discover real-world patterns and validate systematic failures. We conducted 18 semi-structured interviews with ML practitioners to better understand the challenges of behavioral evaluation and found that it is a collaborative, use-case-first process that is not adequately supported by existing task- and domain-specific tools. Using these findings, we designed Zeno, a general-purpose framework for visualizing and testing AI systems across diverse use cases. In four case studies with participants using Zeno on real-world models, we found that practitioners were able to reproduce previous manual analyses and discover new systematic failures. △ Less

Submitted 9 February, 2023; originally announced February 2023.

arXiv:2302.00096 [pdf, other]

Ignore, Trust, or Negotiate: Understanding Clinician Acceptance of AI-Based Treatment Recommendations in Health Care

Authors: Venkatesh Sivaraman, Leigh A. Bukowski, Joel Levin, Jeremy M. Kahn, Adam Perer

Abstract: Artificial intelligence (AI) in healthcare has the potential to improve patient outcomes, but clinician acceptance remains a critical barrier. We developed a novel decision support interface that provides interpretable treatment recommendations for sepsis, a life-threatening condition in which decisional uncertainty is common, treatment practices vary widely, and poor outcomes can occur even with… ▽ More Artificial intelligence (AI) in healthcare has the potential to improve patient outcomes, but clinician acceptance remains a critical barrier. We developed a novel decision support interface that provides interpretable treatment recommendations for sepsis, a life-threatening condition in which decisional uncertainty is common, treatment practices vary widely, and poor outcomes can occur even with optimal decisions. This system formed the basis of a mixed-methods study in which 24 intensive care clinicians made AI-assisted decisions on real patient cases. We found that explanations generally increased confidence in the AI, but concordance with specific recommendations varied beyond the binary acceptance or rejection described in prior work. Although clinicians sometimes ignored or trusted the AI, they also often prioritized aspects of the recommendations to follow, reject, or delay in a process we term "negotiation." These results reveal novel barriers to adoption of treatment-focused AI tools and suggest ways to better support differing clinician perspectives. △ Less

Submitted 31 January, 2023; originally announced February 2023.

Comments: CHI 2023

arXiv:2301.06937 [pdf, other]

doi 10.1145/3579612

Improving Human-AI Collaboration With Descriptions of AI Behavior

Authors: Ángel Alexander Cabrera, Adam Perer, Jason I. Hong

Abstract: People work with AI systems to improve their decision making, but often under- or over-rely on AI predictions and perform worse than they would have unassisted. To help people appropriately rely on AI aids, we propose showing them behavior descriptions, details of how AI systems perform on subgroups of instances. We tested the efficacy of behavior descriptions through user studies with 225 partici… ▽ More People work with AI systems to improve their decision making, but often under- or over-rely on AI predictions and perform worse than they would have unassisted. To help people appropriately rely on AI aids, we propose showing them behavior descriptions, details of how AI systems perform on subgroups of instances. We tested the efficacy of behavior descriptions through user studies with 225 participants in three distinct domains: fake review detection, satellite image classification, and bird classification. We found that behavior descriptions can increase human-AI accuracy through two mechanisms: hel** people identify AI failures and increasing people's reliance on the AI when it is more accurate. These findings highlight the importance of people's mental models in human-AI collaboration and show that informing people of high-level AI behaviors can significantly improve AI-assisted decision making. △ Less

Submitted 5 January, 2023; originally announced January 2023.

Comments: 21 pages

Journal ref: Proc. ACM Hum.-Comput. Interact. 7, CSCW1, Article 136 (April 2023)

arXiv:2210.08979 [pdf, other]

An Interactive Interpretability System for Breast Cancer Screening with Deep Learning

Authors: Yuzhe Lu, Adam Perer

Abstract: Deep learning methods, in particular convolutional neural networks, have emerged as a powerful tool in medical image computing tasks. While these complex models provide excellent performance, their black-box nature may hinder real-world adoption in high-stakes decision-making. In this paper, we propose an interactive system to take advantage of state-of-the-art interpretability techniques to assis… ▽ More Deep learning methods, in particular convolutional neural networks, have emerged as a powerful tool in medical image computing tasks. While these complex models provide excellent performance, their black-box nature may hinder real-world adoption in high-stakes decision-making. In this paper, we propose an interactive system to take advantage of state-of-the-art interpretability techniques to assist radiologists with breast cancer screening. Our system integrates a deep learning model into the radiologists' workflow and provides novel interactions to promote understanding of the model's decision-making process. Moreover, we demonstrate that our system can take advantage of user interactions progressively to provide finer-grained explainability reports with little labeling overhead. Due to the generic nature of the adopted interpretability technique, our system is domain-agnostic and can be used for many different medical image computing tasks, presenting a novel perspective on how we can leverage visual analytics to transform originally static interpretability techniques to augment human decision making and promote the adoption of medical AI. △ Less

Submitted 29 September, 2022; originally announced October 2022.

arXiv:2204.13872 [pdf, other]

Extended Analysis of "How Child Welfare Workers Reduce Racial Disparities in Algorithmic Decisions"

Authors: Logan Stapleton, Hao-Fei Cheng, Anna Kawakami, Venkatesh Sivaraman, Yanghuidi Cheng, Diana Qing, Adam Perer, Kenneth Holstein, Zhiwei Steven Wu, Haiyi Zhu

Abstract: This is an extended analysis of our paper "How Child Welfare Workers Reduce Racial Disparities in Algorithmic Decisions," which looks at racial disparities in the Allegheny Family Screening Tool, an algorithm used to help child welfare workers decide which families the Allegheny County child welfare agency (CYF) should investigate. On April 27, 2022, Allegheny County CYF sent us an updated dataset… ▽ More This is an extended analysis of our paper "How Child Welfare Workers Reduce Racial Disparities in Algorithmic Decisions," which looks at racial disparities in the Allegheny Family Screening Tool, an algorithm used to help child welfare workers decide which families the Allegheny County child welfare agency (CYF) should investigate. On April 27, 2022, Allegheny County CYF sent us an updated dataset and pre-processing steps. In this extended analysis of our paper, we show the results from re-running all quantitative analyses in our paper with this new data and pre-processing. We find that our main findings in our paper were robust to changes in data and pre-processing. Particularly, the Allegheny Family Screening Tool on its own would have made more racially disparate decisions than workers, and workers used the tool to decrease those algorithmic disparities. Some minor results changed, including a slight increase in the screen-in rate from before to after the implementation of the AFST reported our paper. △ Less

Submitted 29 April, 2022; originally announced April 2022.

arXiv:2204.10814 [pdf, ps, other]

"Public(s)-in-the-Loop": Facilitating Deliberation of Algorithmic Decisions in Contentious Public Policy Domains

Authors: Hong Shen, Ángel Alexander Cabrera, Adam Perer, Jason Hong

Abstract: This position paper offers a framework to think about how to better involve human influence in algorithmic decision-making of contentious public policy issues. Drawing from insights in communication literature, we introduce a "public(s)-in-the-loop" approach and enumerates three features that are central to this approach: publics as plural political entities, collective decision-making through del… ▽ More This position paper offers a framework to think about how to better involve human influence in algorithmic decision-making of contentious public policy issues. Drawing from insights in communication literature, we introduce a "public(s)-in-the-loop" approach and enumerates three features that are central to this approach: publics as plural political entities, collective decision-making through deliberation, and the construction of publics. It explores how these features might advance our understanding of stakeholder participation in AI design in contentious public policy domains such as recidivism prediction. Finally, it sketches out part of a research agenda for the HCI community to support this work. △ Less

Submitted 22 April, 2022; originally announced April 2022.

Comments: 5 pages, 0 figure, accepted to CHI2020 Fair & Responsible AI Workshop

arXiv:2204.02310 [pdf, other]

Improving Human-AI Partnerships in Child Welfare: Understanding Worker Practices, Challenges, and Desires for Algorithmic Decision Support

Authors: Anna Kawakami, Venkatesh Sivaraman, Hao-Fei Cheng, Logan Stapleton, Yanghuidi Cheng, Diana Qing, Adam Perer, Zhiwei Steven Wu, Haiyi Zhu, Kenneth Holstein

Abstract: AI-based decision support tools (ADS) are increasingly used to augment human decision-making in high-stakes, social contexts. As public sector agencies begin to adopt ADS, it is critical that we understand workers' experiences with these systems in practice. In this paper, we present findings from a series of interviews and contextual inquiries at a child welfare agency, to understand how they cur… ▽ More AI-based decision support tools (ADS) are increasingly used to augment human decision-making in high-stakes, social contexts. As public sector agencies begin to adopt ADS, it is critical that we understand workers' experiences with these systems in practice. In this paper, we present findings from a series of interviews and contextual inquiries at a child welfare agency, to understand how they currently make AI-assisted child maltreatment screening decisions. Overall, we observe how workers' reliance upon the ADS is guided by (1) their knowledge of rich, contextual information beyond what the AI model captures, (2) their beliefs about the ADS's capabilities and limitations relative to their own, (3) organizational pressures and incentives around the use of the ADS, and (4) awareness of misalignments between algorithmic predictions and their own decision-making objectives. Drawing upon these findings, we discuss design implications towards supporting more effective human-AI decision-making. △ Less

Submitted 5 April, 2022; originally announced April 2022.

Comments: 2022 Conference on Human Factors in Computing Systems

ACM Class: K.4.0

arXiv:2202.02641 [pdf, other]

Emblaze: Illuminating Machine Learning Representations through Interactive Comparison of Embedding Spaces

Authors: Venkatesh Sivaraman, Yiwei Wu, Adam Perer

Abstract: Modern machine learning techniques commonly rely on complex, high-dimensional embedding representations to capture underlying structure in the data and improve performance. In order to characterize model flaws and choose a desirable representation, model builders often need to compare across multiple embedding spaces, a challenging analytical task supported by few existing tools. We first intervie… ▽ More Modern machine learning techniques commonly rely on complex, high-dimensional embedding representations to capture underlying structure in the data and improve performance. In order to characterize model flaws and choose a desirable representation, model builders often need to compare across multiple embedding spaces, a challenging analytical task supported by few existing tools. We first interviewed nine embedding experts in a variety of fields to characterize the diverse challenges they face and techniques they use when analyzing embedding spaces. Informed by these perspectives, we developed a novel system called Emblaze that integrates embedding space comparison within a computational notebook environment. Emblaze uses an animated, interactive scatter plot with a novel Star Trail augmentation to enable visual comparison. It also employs novel neighborhood analysis and clustering procedures to dynamically suggest groups of points with interesting changes between spaces. Through a series of case studies with ML experts, we demonstrate how interactive comparison with Emblaze can help gain new insights into embedding space structure. △ Less

Submitted 16 February, 2022; v1 submitted 5 February, 2022; originally announced February 2022.

Comments: 23 pages, 5 figures, 2 tables. To be presented at IUI'22. arXiv version updated Feb 16 2022 with corrected publication year and copyright

arXiv:2111.02626 [pdf, other]

Characterizing Human Explanation Strategies to Inform the Design of Explainable AI for Building Damage Assessment

Authors: Donghoon Shin, Sachin Grover, Kenneth Holstein, Adam Perer

Abstract: Explainable AI (XAI) is a promising means of supporting human-AI collaborations for high-stakes visual detection tasks, such as damage detection tasks from satellite imageries, as fully-automated approaches are unlikely to be perfectly safe and reliable. However, most existing XAI techniques are not informed by the understandings of task-specific needs of humans for explanations. Thus, we took a f… ▽ More Explainable AI (XAI) is a promising means of supporting human-AI collaborations for high-stakes visual detection tasks, such as damage detection tasks from satellite imageries, as fully-automated approaches are unlikely to be perfectly safe and reliable. However, most existing XAI techniques are not informed by the understandings of task-specific needs of humans for explanations. Thus, we took a first step toward understanding what forms of XAI humans require in damage detection tasks. We conducted an online crowdsourced study to understand how people explain their own assessments, when evaluating the severity of building damage based on satellite imagery. Through the study with 60 crowdworkers, we surfaced six major strategies that humans utilize to explain their visual damage assessments. We present implications of our findings for the design of XAI methods for such visual detection contexts, and discuss opportunities for future research. △ Less

Submitted 4 November, 2021; originally announced November 2021.

Comments: Accepted at NeurIPS 2021 Workshop on Artificial Intelligence for Humanitarian Assistance and Disaster Response (AI+HADR 2021)

arXiv:2109.11690 [pdf, other]

doi 10.1145/3479569

Discovering and Validating AI Errors With Crowdsourced Failure Reports

Authors: Ángel Alexander Cabrera, Abraham J. Druck, Jason I. Hong, Adam Perer

Abstract: AI systems can fail to learn important behaviors, leading to real-world issues like safety concerns and biases. Discovering these systematic failures often requires significant developer attention, from hypothesizing potential edge cases to collecting evidence and validating patterns. To scale and streamline this process, we introduce crowdsourced failure reports, end-user descriptions of how or w… ▽ More AI systems can fail to learn important behaviors, leading to real-world issues like safety concerns and biases. Discovering these systematic failures often requires significant developer attention, from hypothesizing potential edge cases to collecting evidence and validating patterns. To scale and streamline this process, we introduce crowdsourced failure reports, end-user descriptions of how or why a model failed, and show how developers can use them to detect AI errors. We also design and implement Deblinder, a visual analytics system for synthesizing failure reports that developers can use to discover and validate systematic failures. In semi-structured interviews and think-aloud studies with 10 AI practitioners, we explore the affordances of the Deblinder system and the applicability of failure reports in real-world settings. Lastly, we show how collecting additional data from the groups identified by developers can improve model performance. △ Less

Submitted 23 September, 2021; originally announced September 2021.

arXiv:2103.11029 [pdf, other]

TextEssence: A Tool for Interactive Analysis of Semantic Shifts Between Corpora

Authors: Denis Newman-Griffis, Venkatesh Sivaraman, Adam Perer, Eric Fosler-Lussier, Harry Hochheiser

Abstract: Embeddings of words and concepts capture syntactic and semantic regularities of language; however, they have seen limited use as tools to study characteristics of different corpora and how they relate to one another. We introduce TextEssence, an interactive system designed to enable comparative analysis of corpora using embeddings. TextEssence includes visual, neighbor-based, and similarity-based… ▽ More Embeddings of words and concepts capture syntactic and semantic regularities of language; however, they have seen limited use as tools to study characteristics of different corpora and how they relate to one another. We introduce TextEssence, an interactive system designed to enable comparative analysis of corpora using embeddings. TextEssence includes visual, neighbor-based, and similarity-based modes of embedding analysis in a lightweight, web-based interface. We further propose a new measure of embedding confidence based on nearest neighborhood overlap, to assist in identifying high-quality embeddings for corpus analysis. A case study on COVID-19 scientific literature illustrates the utility of the system. TextEssence is available from https://github.com/drgriffis/text-essence. △ Less

Submitted 19 March, 2021; originally announced March 2021.

Comments: Accepted as a Systems Demonstration at NAACL-HLT 2021. Video demonstration at https://youtu.be/1xEEfsMwL0k

arXiv:1908.00387 [pdf, other]

Ablate, Variate, and Contemplate: Visual Analytics for Discovering Neural Architectures

Authors: Dylan Cashman, Adam Perer, Remco Chang, Hendrik Strobelt

Abstract: Deep learning models require the configuration of many layers and parameters in order to get good results. However, there are currently few systematic guidelines for how to configure a successful model. This means model builders often have to experiment with different configurations by manually programming different architectures (which is tedious and time consuming) or rely on purely automated ap… ▽ More Deep learning models require the configuration of many layers and parameters in order to get good results. However, there are currently few systematic guidelines for how to configure a successful model. This means model builders often have to experiment with different configurations by manually programming different architectures (which is tedious and time consuming) or rely on purely automated approaches to generate and train the architectures (which is expensive). In this paper, we present Rapid Exploration of Model Architectures and Parameters, or REMAP, a visual analytics tool that allows a model builder to discover a deep learning model quickly via exploration and rapid experimentation of neural network architectures. In REMAP, the user explores the large and complex parameter space for neural network architectures using a combination of global inspection and local experimentation. Through a visual overview of a set of models, the user identifies interesting clusters of architectures. Based on their findings, the user can run ablation and variation experiments to identify the effects of adding, removing, or replacing layers in a given architecture and generate new models accordingly. They can also handcraft new models using a simple graphical interface. As a result, a model builder can build deep learning models quickly, efficiently, and without manual programming. We inform the design of REMAP through a design study with four deep learning model builders. Through a use case, we demonstrate that REMAP allows users to discover performant neural network architectures efficiently using visual exploration and user-defined semi-automated searches through the model space. △ Less

Submitted 30 July, 2019; originally announced August 2019.

arXiv:1902.06787 [pdf, other]

Regularizing Black-box Models for Improved Interpretability

Authors: Gregory Plumb, Maruan Al-Shedivat, Angel Alexander Cabrera, Adam Perer, Eric Xing, Ameet Talwalkar

Abstract: Most of the work on interpretable machine learning has focused on designing either inherently interpretable models, which typically trade-off accuracy for interpretability, or post-hoc explanation systems, whose explanation quality can be unpredictable. Our method, ExpO, is a hybridization of these approaches that regularizes a model for explanation quality at training time. Importantly, these reg… ▽ More Most of the work on interpretable machine learning has focused on designing either inherently interpretable models, which typically trade-off accuracy for interpretability, or post-hoc explanation systems, whose explanation quality can be unpredictable. Our method, ExpO, is a hybridization of these approaches that regularizes a model for explanation quality at training time. Importantly, these regularizers are differentiable, model agnostic, and require no domain knowledge to define. We demonstrate that post-hoc explanations for ExpO-regularized models have better explanation quality, as measured by the common fidelity and stability metrics. We verify that improving these metrics leads to significantly more useful explanations with a user study on a realistic task. △ Less

Submitted 8 November, 2020; v1 submitted 18 February, 2019; originally announced February 2019.

arXiv:1804.09299 [pdf, other]

Seq2Seq-Vis: A Visual Debugging Tool for Sequence-to-Sequence Models

Authors: Hendrik Strobelt, Sebastian Gehrmann, Michael Behrisch, Adam Perer, Hanspeter Pfister, Alexander M. Rush

Abstract: Neural Sequence-to-Sequence models have proven to be accurate and robust for many sequence prediction tasks, and have become the standard approach for automatic translation of text. The models work in a five stage blackbox process that involves encoding a source sequence to a vector space and then decoding out to a new target sequence. This process is now standard, but like many deep learning meth… ▽ More Neural Sequence-to-Sequence models have proven to be accurate and robust for many sequence prediction tasks, and have become the standard approach for automatic translation of text. The models work in a five stage blackbox process that involves encoding a source sequence to a vector space and then decoding out to a new target sequence. This process is now standard, but like many deep learning methods remains quite difficult to understand or debug. In this work, we present a visual analysis tool that allows interaction with a trained sequence-to-sequence model through each stage of the translation process. The aim is to identify which patterns have been learned and to detect model errors. We demonstrate the utility of our tool through several real-world large-scale sequence-to-sequence use cases. △ Less

Submitted 16 October, 2018; v1 submitted 24 April, 2018; originally announced April 2018.

Comments: VAST - IEEE VIS 2018

arXiv:1606.05685 [pdf, ps, other]

Using Visual Analytics to Interpret Predictive Machine Learning Models

Authors: Josua Krause, Adam Perer, Enrico Bertini

Abstract: It is commonly believed that increasing the interpretability of a machine learning model may decrease its predictive power. However, inspecting input-output relationships of those models using visual analytics, while treating them as black-box, can help to understand the reasoning behind outcomes without sacrificing predictive quality. We identify a space of possible solutions and provide two exam… ▽ More It is commonly believed that increasing the interpretability of a machine learning model may decrease its predictive power. However, inspecting input-output relationships of those models using visual analytics, while treating them as black-box, can help to understand the reasoning behind outcomes without sacrificing predictive quality. We identify a space of possible solutions and provide two examples of where such techniques have been successfully used in practice. △ Less

Submitted 21 June, 2016; v1 submitted 17 June, 2016; originally announced June 2016.

Comments: presented at 2016 ICML Workshop on Human Interpretability in Machine Learning (WHI 2016), New York, NY

Showing 1–19 of 19 results for author: Perer, A