-
Feature Learning for Nonlinear Dimensionality Reduction toward Maximal Extraction of Hidden Patterns
Authors:
Takanori Fujiwara,
Yun-Hsin Kuo,
Anders Ynnerman,
Kwan-Liu Ma
Abstract:
Dimensionality reduction (DR) plays a vital role in the visual analysis of high-dimensional data. One main aim of DR is to reveal hidden patterns that lie on intrinsic low-dimensional manifolds. However, DR often overlooks important patterns when the manifolds are distorted or masked by certain influential data attributes. This paper presents a feature learning framework, FEALM, designed to genera…
▽ More
Dimensionality reduction (DR) plays a vital role in the visual analysis of high-dimensional data. One main aim of DR is to reveal hidden patterns that lie on intrinsic low-dimensional manifolds. However, DR often overlooks important patterns when the manifolds are distorted or masked by certain influential data attributes. This paper presents a feature learning framework, FEALM, designed to generate a set of optimized data projections for nonlinear DR in order to capture important patterns in the hidden manifolds. These projections produce maximally different nearest-neighbor graphs so that resultant DR outcomes are significantly different. To achieve such a capability, we design an optimization algorithm as well as introduce a new graph dissimilarity measure, named neighbor-shape dissimilarity. Additionally, we develop interactive visualizations to assist comparison of obtained DR results and interpretation of each DR result. We demonstrate FEALM's effectiveness through experiments and case studies using synthetic and real-world datasets.
△ Less
Submitted 24 February, 2023; v1 submitted 28 June, 2022;
originally announced June 2022.
-
Interactive Dimensionality Reduction for Comparative Analysis
Authors:
Takanori Fujiwara,
Xinhai Wei,
Jian Zhao,
Kwan-Liu Ma
Abstract:
Finding the similarities and differences between groups of datasets is a fundamental analysis task. For high-dimensional data, dimensionality reduction (DR) methods are often used to find the characteristics of each group. However, existing DR methods provide limited capability and flexibility for such comparative analysis as each method is designed only for a narrow analysis target, such as ident…
▽ More
Finding the similarities and differences between groups of datasets is a fundamental analysis task. For high-dimensional data, dimensionality reduction (DR) methods are often used to find the characteristics of each group. However, existing DR methods provide limited capability and flexibility for such comparative analysis as each method is designed only for a narrow analysis target, such as identifying factors that most differentiate groups. This paper presents an interactive DR framework where we integrate our new DR method, called ULCA (unified linear comparative analysis), with an interactive visual interface. ULCA unifies two DR schemes, discriminant analysis and contrastive learning, to support various comparative analysis tasks. To provide flexibility for comparative analysis, we develop an optimization algorithm that enables analysts to interactively refine ULCA results. Additionally, the interactive visualization interface facilitates interpretation and refinement of the ULCA results. We evaluate ULCA and the optimization algorithm to show their efficiency as well as present multiple case studies using real-world datasets to demonstrate the usefulness of this framework.
△ Less
Submitted 27 October, 2021; v1 submitted 29 June, 2021;
originally announced June 2021.
-
Contrastive Multiple Correspondence Analysis (cMCA): Using Contrastive Learning to Identify Latent Subgroups in Political Parties
Authors:
Takanori Fujiwara,
Tzu-** Liu
Abstract:
Scaling methods have long been utilized to simplify and cluster high-dimensional data. However, the general latent spaces across all predefined groups derived from these methods sometimes do not fall into researchers' interest regarding specific patterns within groups. To tackle this issue, we adopt an emerging analysis approach called contrastive learning. We contribute to this growing field by e…
▽ More
Scaling methods have long been utilized to simplify and cluster high-dimensional data. However, the general latent spaces across all predefined groups derived from these methods sometimes do not fall into researchers' interest regarding specific patterns within groups. To tackle this issue, we adopt an emerging analysis approach called contrastive learning. We contribute to this growing field by extending its ideas to multiple correspondence analysis (MCA) in order to enable an analysis of data often encountered by social scientists -- containing binary, ordinal, and nominal variables. We demonstrate the utility of contrastive MCA (cMCA) by analyzing two different surveys of voters in the U.S. and U.K. Our results suggest that, first, cMCA can identify substantively important dimensions and divisions among subgroups that are overlooked by traditional methods; second, for other cases, cMCA can derive latent traits that emphasize subgroups seen moderately in those derived by traditional methods.
△ Less
Submitted 1 June, 2023; v1 submitted 8 July, 2020;
originally announced July 2020.
-
Network Comparison with Interpretable Contrastive Network Representation Learning
Authors:
Takanori Fujiwara,
Jian Zhao,
Francine Chen,
Yaoliang Yu,
Kwan-Liu Ma
Abstract:
Identifying unique characteristics in a network through comparison with another network is an essential network analysis task. For example, with networks of protein interactions obtained from normal and cancer tissues, we can discover unique types of interactions in cancer tissues. This analysis task could be greatly assisted by contrastive learning, which is an emerging analysis approach to disco…
▽ More
Identifying unique characteristics in a network through comparison with another network is an essential network analysis task. For example, with networks of protein interactions obtained from normal and cancer tissues, we can discover unique types of interactions in cancer tissues. This analysis task could be greatly assisted by contrastive learning, which is an emerging analysis approach to discover salient patterns in one dataset relative to another. However, existing contrastive learning methods cannot be directly applied to networks as they are designed only for high-dimensional data analysis. To address this problem, we introduce a new analysis approach called contrastive network representation learning (cNRL). By integrating two machine learning schemes, network representation learning and contrastive learning, cNRL enables embedding of network nodes into a low-dimensional representation that reveals the uniqueness of one network compared to another. Within this approach, we also design a method, named i-cNRL, which offers interpretability in the learned results, allowing for understanding which specific patterns are only found in one network. We demonstrate the effectiveness of i-cNRL for network comparison with multiple network models and real-world datasets. Furthermore, we compare i-cNRL and other potential cNRL algorithm designs through quantitative and qualitative evaluations.
△ Less
Submitted 15 February, 2022; v1 submitted 25 May, 2020;
originally announced May 2020.
-
A Visual Analytics System for Multi-model Comparison on Clinical Data Predictions
Authors:
Yiran Li,
Takanori Fujiwara,
Yong K. Choi,
Katherine K. Kim,
Kwan-Liu Ma
Abstract:
There is a growing trend of applying machine learning methods to medical datasets in order to predict patients' future status. Although some of these methods achieve high performance, challenges still exist in comparing and evaluating different models through their interpretable information. Such analytics can help clinicians improve evidence-based medical decision making. In this work, we develop…
▽ More
There is a growing trend of applying machine learning methods to medical datasets in order to predict patients' future status. Although some of these methods achieve high performance, challenges still exist in comparing and evaluating different models through their interpretable information. Such analytics can help clinicians improve evidence-based medical decision making. In this work, we develop a visual analytics system that compares multiple models' prediction criteria and evaluates their consistency. With our system, users can generate knowledge on different models' inner criteria and how confidently we can rely on each model's prediction for a certain patient. Through a case study of a publicly available clinical dataset, we demonstrate the effectiveness of our visual analytics system to assist clinicians and researchers in comparing and quantitatively evaluating different machine learning methods.
△ Less
Submitted 23 March, 2020; v1 submitted 18 February, 2020;
originally announced February 2020.
-
Comparative Visual Analytics for Assessing Medical Records with Sequence Embedding
Authors:
Rongchen Guo,
Takanori Fujiwara,
Yiran Li,
Kelly M. Lima,
Soman Sen,
Nam K. Tran,
Kwan-Liu Ma
Abstract:
Machine learning for data-driven diagnosis has been actively studied in medicine to provide better healthcare. Supporting analysis of a patient cohort similar to a patient under treatment is a key task for clinicians to make decisions with high confidence. However, such analysis is not straightforward due to the characteristics of medical records: high dimensionality, irregularity in time, and spa…
▽ More
Machine learning for data-driven diagnosis has been actively studied in medicine to provide better healthcare. Supporting analysis of a patient cohort similar to a patient under treatment is a key task for clinicians to make decisions with high confidence. However, such analysis is not straightforward due to the characteristics of medical records: high dimensionality, irregularity in time, and sparsity. To address this challenge, we introduce a method for similarity calculation of medical records. Our method employs event and sequence embeddings. While we use an autoencoder for the event embedding, we apply its variant with the self-attention mechanism for the sequence embedding. Moreover, in order to better handle the irregularity of data, we enhance the self-attention mechanism with consideration of different time intervals. We have developed a visual analytics system to support comparative studies of patient records. To make a comparison of sequences with different lengths easier, our system incorporates a sequence alignment method. Through its interactive interface, the user can quickly identify patients of interest and conveniently review both the temporal and multivariate aspects of the patient records. We demonstrate the effectiveness of our design and system with case studies using a real-world dataset from the neonatal intensive care unit of UC Davis.
△ Less
Submitted 23 March, 2020; v1 submitted 18 February, 2020;
originally announced February 2020.
-
Supporting Analysis of Dimensionality Reduction Results with Contrastive Learning
Authors:
Takanori Fujiwara,
Oh-Hyun Kwon,
Kwan-Liu Ma
Abstract:
Dimensionality reduction (DR) is frequently used for analyzing and visualizing high-dimensional data as it provides a good first glance of the data. However, to interpret the DR result for gaining useful insights from the data, it would take additional analysis effort such as identifying clusters and understanding their characteristics. While there are many automatic methods (e.g., density-based c…
▽ More
Dimensionality reduction (DR) is frequently used for analyzing and visualizing high-dimensional data as it provides a good first glance of the data. However, to interpret the DR result for gaining useful insights from the data, it would take additional analysis effort such as identifying clusters and understanding their characteristics. While there are many automatic methods (e.g., density-based clustering methods) to identify clusters, effective methods for understanding a cluster's characteristics are still lacking. A cluster can be mostly characterized by its distribution of feature values. Reviewing the original feature values is not a straightforward task when the number of features is large. To address this challenge, we present a visual analytics method that effectively highlights the essential features of a cluster in a DR result. To extract the essential features, we introduce an enhanced usage of contrastive principal component analysis (cPCA). Our method, called ccPCA (contrasting clusters in PCA), can calculate each feature's relative contribution to the contrast between one cluster and other clusters. With ccPCA, we have created an interactive system including a scalable visualization of clusters' feature contributions. We demonstrate the effectiveness of our method and system with case studies using several publicly available datasets.
△ Less
Submitted 14 October, 2019; v1 submitted 9 May, 2019;
originally announced May 2019.