-
An Empirical Study of Counterfactual Visualization to Support Visual Causal Inference
Authors:
Arran Zeyu Wang,
David Borland,
David Gotz
Abstract:
Counterfactuals -- expressing what might have been true under different circumstances -- have been widely applied in statistics and machine learning to help understand causal relationships. More recently, counterfactuals have begun to emerge as a technique being applied within visualization research. However, it remains unclear to what extent counterfactuals can aid with visual data communication.…
▽ More
Counterfactuals -- expressing what might have been true under different circumstances -- have been widely applied in statistics and machine learning to help understand causal relationships. More recently, counterfactuals have begun to emerge as a technique being applied within visualization research. However, it remains unclear to what extent counterfactuals can aid with visual data communication. In this paper, we primarily focus on assessing the quality of users' understanding of data when provided with counterfactual visualizations. We propose a preliminary model of causality comprehension by connecting theories from causal inference and visual data communication. Leveraging this model, we conducted an empirical study to explore how counterfactuals can improve users' understanding of data in static visualizations. Our results indicate that visualizing counterfactuals had a positive impact on participants' interpretations of causal relations within datasets. These results motivate a discussion of how to more effectively incorporate counterfactuals into data visualizations.
△ Less
Submitted 16 January, 2024;
originally announced January 2024.
-
Using Counterfactuals to Improve Causal Inferences from Visualizations
Authors:
David Borland,
Arran Zeyu Wang,
David Gotz
Abstract:
Traditional approaches to data visualization have often focused on comparing different subsets of data, and this is reflected in the many techniques developed and evaluated over the years for visual comparison. Similarly, common workflows for exploratory visualization are built upon the idea of users interactively applying various filter and grou** mechanisms in search of new insights. This para…
▽ More
Traditional approaches to data visualization have often focused on comparing different subsets of data, and this is reflected in the many techniques developed and evaluated over the years for visual comparison. Similarly, common workflows for exploratory visualization are built upon the idea of users interactively applying various filter and grou** mechanisms in search of new insights. This paradigm has proven effective at hel** users identify correlations between variables that can inform thinking and decision-making. However, recent studies show that consumers of visualizations often draw causal conclusions even when not supported by the data. Motivated by these observations, this article highlights recent advances from a growing community of researchers exploring methods that aim to directly support visual causal inference. However, many of these approaches have their own limitations which limit their use in many real-world scenarios. This article therefore also outlines a set of key open challenges and corresponding priorities for new research to advance the state of the art in visual causal inference.
△ Less
Submitted 16 January, 2024;
originally announced January 2024.
-
Enabling Longitudinal Exploratory Analysis of Clinical COVID Data
Authors:
David Borland,
Irena Brain,
Karamarie Fecho,
Emily Pfaff,
Hao Xu,
James Champion,
Chris Bizon,
David Gotz
Abstract:
As the COVID-19 pandemic continues to impact the world, data is being gathered and analyzed to better understand the disease. Recognizing the potential for visual analytics technologies to support exploratory analysis and hypothesis generation from longitudinal clinical data, a team of collaborators worked to apply existing event sequence visual analytics technologies to a longitudinal clinical da…
▽ More
As the COVID-19 pandemic continues to impact the world, data is being gathered and analyzed to better understand the disease. Recognizing the potential for visual analytics technologies to support exploratory analysis and hypothesis generation from longitudinal clinical data, a team of collaborators worked to apply existing event sequence visual analytics technologies to a longitudinal clinical data from a cohort of 998 patients with high rates of COVID-19 infection. This paper describes the initial steps toward this goal, including: (1) the data transformation and processing work required to prepare the data for visual analysis, (2) initial findings and observations, and (3) qualitative feedback and lessons learned which highlight key features as well as limitations to address in future work.
△ Less
Submitted 25 August, 2021;
originally announced August 2021.
-
Improving Visualization Interpretation Using Counterfactuals
Authors:
Smiti Kaul,
David Borland,
Nan Cao,
David Gotz
Abstract:
Complex, high-dimensional data is used in a wide range of domains to explore problems and make decisions. Analysis of high-dimensional data, however, is vulnerable to the hidden influence of confounding variables, especially as users apply ad hoc filtering operations to visualize only specific subsets of an entire dataset. Thus, visual data-driven analysis can mislead users and encourage mistaken…
▽ More
Complex, high-dimensional data is used in a wide range of domains to explore problems and make decisions. Analysis of high-dimensional data, however, is vulnerable to the hidden influence of confounding variables, especially as users apply ad hoc filtering operations to visualize only specific subsets of an entire dataset. Thus, visual data-driven analysis can mislead users and encourage mistaken assumptions about causality or the strength of relationships between features. This work introduces a novel visual approach designed to reveal the presence of confounding variables via counterfactual possibilities during visual data analysis. It is implemented in CoFact, an interactive visualization prototype that determines and visualizes \textit{counterfactual subsets} to better support user exploration of feature relationships. Using publicly available datasets, we conducted a controlled user study to demonstrate the effectiveness of our approach; the results indicate that users exposed to counterfactual visualizations formed more careful judgments about feature-to-outcome relationships.
△ Less
Submitted 21 July, 2021;
originally announced July 2021.
-
Selection-Bias-Corrected Visualization via Dynamic Reweighting
Authors:
David Borland,
Jonathan Zhang,
Smiti Kaul,
David Gotz
Abstract:
The collection and visual analysis of large-scale data from complex systems, such as electronic health records or clickstream data, has become increasingly common across a wide range of industries. This type of retrospective visual analysis, however, is prone to a variety of selection bias effects, especially for high-dimensional data where only a subset of dimensions is visualized at any given ti…
▽ More
The collection and visual analysis of large-scale data from complex systems, such as electronic health records or clickstream data, has become increasingly common across a wide range of industries. This type of retrospective visual analysis, however, is prone to a variety of selection bias effects, especially for high-dimensional data where only a subset of dimensions is visualized at any given time. The risk of selection bias is even higher when analysts dynamically apply filters or perform grou** operations during ad hoc analyses. These bias effects threatens the validity and generalizability of insights discovered during visual analysis as the basis for decision making. Past work has focused on bias transparency, hel** users understand when selection bias may have occurred. However, countering the effects of selection bias via bias mitigation is typically left for the user to accomplish as a separate process. Dynamic reweighting (DR) is a novel computational approach to selection bias mitigation that helps users craft bias-corrected visualizations. This paper describes the DR workflow, introduces key DR visualization designs, and presents statistical methods that support the DR process. Use cases from the medical domain, as well as findings from domain expert user interviews, are also reported.
△ Less
Submitted 24 August, 2020; v1 submitted 29 July, 2020;
originally announced July 2020.
-
Selection Bias Tracking and Detailed Subset Comparison for High-Dimensional Data
Authors:
David Borland,
Wenyuan Wang,
Jonathan Zhang,
Joshua Shrestha,
David Gotz
Abstract:
The collection of large, complex datasets has become common across a wide variety of domains. Visual analytics tools increasingly play a key role in exploring and answering complex questions about these large datasets. However, many visualizations are not designed to concurrently visualize the large number of dimensions present in complex datasets (e.g. tens of thousands of distinct codes in an el…
▽ More
The collection of large, complex datasets has become common across a wide variety of domains. Visual analytics tools increasingly play a key role in exploring and answering complex questions about these large datasets. However, many visualizations are not designed to concurrently visualize the large number of dimensions present in complex datasets (e.g. tens of thousands of distinct codes in an electronic health record system). This fact, combined with the ability of many visual analytics systems to enable rapid, ad-hoc specification of groups, or cohorts, of individuals based on a small subset of visualized dimensions, leads to the possibility of introducing selection bias--when the user creates a cohort based on a specified set of dimensions, differences across many other unseen dimensions may also be introduced. These unintended side effects may result in the cohort no longer being representative of the larger population intended to be studied, which can negatively affect the validity of subsequent analyses. We present techniques for selection bias tracking and visualization that can be incorporated into high-dimensional exploratory visual analytics systems, with a focus on medical data with existing data hierarchies. These techniques include: (1) tree-based cohort provenance and visualization, with a user-specified baseline cohort that all other cohorts are compared against, and visual encoding of the drift for each cohort, which indicates where selection bias may have occurred, and (2) a set of visualizations, including a novel icicle-plot based visualization, to compare in detail the per-dimension differences between the baseline and a user-specified focus cohort. These techniques are integrated into a medical temporal event sequence visual analytics tool. We present example use cases and report findings from domain expert user interviews.
△ Less
Submitted 17 June, 2020; v1 submitted 18 June, 2019;
originally announced June 2019.
-
Visual Analysis of High-Dimensional Event Sequence Data via Dynamic Hierarchical Aggregation
Authors:
David Gotz,
Jonathan Zhang,
Wenyuan Wang,
Joshua Shrestha,
David Borland
Abstract:
Temporal event data are collected across a broad range of domains, and a variety of visual analytics techniques have been developed to empower analysts working with this form of data. These techniques generally display aggregate statistics computed over sets of event sequences that share common patterns. Such techniques are often hindered, however, by the high-dimensionality of many real-world eve…
▽ More
Temporal event data are collected across a broad range of domains, and a variety of visual analytics techniques have been developed to empower analysts working with this form of data. These techniques generally display aggregate statistics computed over sets of event sequences that share common patterns. Such techniques are often hindered, however, by the high-dimensionality of many real-world event sequence datasets because the large number of distinct event types within such data prevents effective aggregation. A common co** strategy for this challenge is to group event types together as a pre-process, prior to visualization, so that each group can be represented within an analysis as a single event type. However, computing these event grou**s as a pre-process also places significant constraints on the analysis. This paper presents a dynamic hierarchical aggregation technique that leverages a predefined hierarchy of dimensions to computationally quantify the informativeness of alternative levels of grou** within the hierarchy at runtime. This allows users to dynamically explore the hierarchy to select the most appropriate level of grou** to use at any individual step within an analysis. Key contributions include an algorithm for interactively determining the most informative set of event grou**s from within a large-scale hierarchy of event types, and a scatter-plus-focus visualization that supports interactive hierarchical exploration. While these contributions are generalizable to other types of problems, we apply them to high-dimensional event sequence analysis using large-scale event type hierarchies from the medical domain. We describe their use within a medical cohort analysis tool called Cadence, demonstrate an example in which the proposed technique supports better views of event sequence data, and report findings from domain expert interviews.
△ Less
Submitted 11 July, 2019; v1 submitted 18 June, 2019;
originally announced June 2019.