-
CAVA: A Visual Analytics System for Exploratory Columnar Data Augmentation Using Knowledge Graphs
Authors:
Dylan Cashman,
Shenyu Xu,
Subhajit Das,
Florian Heimerl,
Cong Liu,
Shah Rukh Humayoun,
Michael Gleicher,
Alex Endert,
Remco Chang
Abstract:
Most visual analytics systems assume that all foraging for data happens before the analytics process; once analysis begins, the set of data attributes considered is fixed. Such separation of data construction from analysis precludes iteration that can enable foraging informed by the needs that arise in-situ during the analysis. The separation of the foraging loop from the data analysis tasks can l…
▽ More
Most visual analytics systems assume that all foraging for data happens before the analytics process; once analysis begins, the set of data attributes considered is fixed. Such separation of data construction from analysis precludes iteration that can enable foraging informed by the needs that arise in-situ during the analysis. The separation of the foraging loop from the data analysis tasks can limit the pace and scope of analysis. In this paper, we present CAVA, a system that integrates data curation and data augmentation with the traditional data exploration and analysis tasks, enabling information foraging in-situ during analysis. Identifying attributes to add to the dataset is difficult because it requires human knowledge to determine which available attributes will be helpful for the ensuing analytical tasks. CAVA crawls knowledge graphs to provide users with a a broad set of attributes drawn from external data to choose from. Users can then specify complex operations on knowledge graphs to construct additional attributes. CAVA shows how visual analytics can help users forage for attributes by letting users visually explore the set of available data, and by serving as an interface for query construction. It also provides visualizations of the knowledge graph itself to help users understand complex joins such as multi-hop aggregations. We assess the ability of our system to enable users to perform complex data combinations without programming in a user study over two datasets. We then demonstrate the generalizability of CAVA through two additional usage scenarios. The results of the evaluation confirm that CAVA is effective in hel** the user perform data foraging that leads to improved analysis outcomes, and offer evidence in support of integrating data augmentation as a part of the visual analytics pipeline.
△ Less
Submitted 6 September, 2020;
originally announced September 2020.
-
Boxer: Interactive Comparison of Classifier Results
Authors:
Michael Gleicher,
Aditya Barve,
Xinyi Yu,
Florian Heimerl
Abstract:
Machine learning practitioners often compare the results of different classifiers to help select, diagnose and tune models. We present Boxer, a system to enable such comparison. Our system facilitates interactive exploration of the experimental results obtained by applying multiple classifiers to a common set of model inputs. The approach focuses on allowing the user to identify interesting subset…
▽ More
Machine learning practitioners often compare the results of different classifiers to help select, diagnose and tune models. We present Boxer, a system to enable such comparison. Our system facilitates interactive exploration of the experimental results obtained by applying multiple classifiers to a common set of model inputs. The approach focuses on allowing the user to identify interesting subsets of training and testing instances and comparing performance of the classifiers on these subsets. The system couples standard visual designs with set algebra interactions and comparative elements. This allows the user to compose and coordinate views to specify subsets and assess classifier performance on them. The flexibility of these compositions allow the user to address a wide range of scenarios in develo** and assessing classifiers. We demonstrate Boxer in use cases including model selection, tuning, fairness assessment, and data quality diagnosis.
△ Less
Submitted 16 April, 2020;
originally announced April 2020.
-
embComp: Visual Interactive Comparison of Vector Embeddings
Authors:
Florian Heimerl,
Christoph Kralj,
Torsten Möller,
Michael Gleicher
Abstract:
This paper introduces embComp, a novel approach for comparing two embeddings that capture the similarity between objects, such as word and document embeddings. We survey scenarios where comparing these embedding spaces is useful. From those scenarios, we derive common tasks, introduce visual analysis methods that support these tasks, and combine them into a comprehensive system. One of embComp's c…
▽ More
This paper introduces embComp, a novel approach for comparing two embeddings that capture the similarity between objects, such as word and document embeddings. We survey scenarios where comparing these embedding spaces is useful. From those scenarios, we derive common tasks, introduce visual analysis methods that support these tasks, and combine them into a comprehensive system. One of embComp's central features are overview visualizations that are based on metrics for measuring differences in the local structure around objects. Summarizing these local metrics over the embeddings provides global overviews of similarities and differences. Detail views allow comparison of the local structure around selected objects and relating this local information to the global views. Integrating and connecting all of these components, embComp supports a range of analysis workflows that help understand similarities and differences between embedding spaces. We assess our approach by applying it in several use cases, including understanding corpora differences via word vector embeddings, and understanding algorithmic differences in generating embeddings.
△ Less
Submitted 1 June, 2021; v1 submitted 4 November, 2019;
originally announced November 2019.
-
Visual Designs for Binned Aggregation of Multi-Class Scatterplots
Authors:
Florian Heimerl,
Chih-Ching Chang,
Alper Sarikaya,
Michael Gleicher
Abstract:
Point sets in 2D with multiple classes are a common type of data. A canonical visualization design for them are scatterplots, which do not scale to large collections of points. For these larger data sets, binned aggregation (or binning) is often used to summarize the data, with many possible design alternatives for creating effective visual representations of these summaries. There are a wide rang…
▽ More
Point sets in 2D with multiple classes are a common type of data. A canonical visualization design for them are scatterplots, which do not scale to large collections of points. For these larger data sets, binned aggregation (or binning) is often used to summarize the data, with many possible design alternatives for creating effective visual representations of these summaries. There are a wide range of designs to show summaries of 2D multi-class point data, each capable of supporting different analysis tasks. In this paper, we explore the space of visual designs for such data, and provide design guidelines for different analysis scenarios. To support these guidelines, we compile a set of abstract tasks and ground them in concrete examples using multiple sample datasets. We then assess designs, and survey a range of design decisions, considering their appropriateness to the tasks. In addition, we provide a web-based implementation to experiment with design choices, supporting the validation of designs based on task needs.
△ Less
Submitted 14 January, 2020; v1 submitted 4 October, 2018;
originally announced October 2018.
-
A User-based Visual Analytics Workflow for Exploratory Model Analysis
Authors:
Dylan Cashman,
Shah Rukh Humayoun,
Florian Heimerl,
Kendall Park,
Subhajit Das,
John Thompson,
Bahador Saket,
Abigail Mosca,
John Stasko,
Alex Endert,
Michael Gleicher,
Remco Chang
Abstract:
Many visual analytics systems allow users to interact with machine learning models towards the goals of data exploration and insight generation on a given dataset. However, in some situations, insights may be less important than the production of an accurate predictive model for future use. In that case, users are more interested in generating of diverse and robust predictive models, verifying the…
▽ More
Many visual analytics systems allow users to interact with machine learning models towards the goals of data exploration and insight generation on a given dataset. However, in some situations, insights may be less important than the production of an accurate predictive model for future use. In that case, users are more interested in generating of diverse and robust predictive models, verifying their performance on holdout data, and selecting the most suitable model for their usage scenario. In this paper, we consider the concept of Exploratory Model Analysis (EMA), which is defined as the process of discovering and selecting relevant models that can be used to make predictions on a data source. We delineate the differences between EMA and the well-known term exploratory data analysis in terms of the desired outcome of the analytic process: insights into the data or a set of deployable models. The contributions of this work are a visual analytics system workflow for EMA, a user study, and two use cases validating the effectiveness of the workflow. We found that our system workflow enabled users to generate complex models, to assess them for various qualities, and to select the most relevant model for their task.
△ Less
Submitted 29 July, 2019; v1 submitted 27 September, 2018;
originally announced September 2018.